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PREFACE 



This viilumc isconccrncd with the ali^unmciu between the way the 
mathematical performance of students is assessed and the reform 
agenda in school mathematics. The central feature ol the current 
reform efforts involves an epistomological shift from the mastery 
of a set of concepts and procedures to mathematical power. TIk 
term mathcniatical power means "'an individual's abilities to 
explore, conjecture, and reason logically, as well as rhe ability to 
use a variety ot mathematical methods effect ;vely to solve 
nonroutine problems. This notion is based on reeognition ot math- 
ematics as more than a collection of concepts and skills to he 
mastered; it includes methods of investigating and reasoning, 
means of communieation, and notions of context. In addition, tor 
eaeh individual, mathematieal power invtdves the development o\ 
personal self-confidence" (National Council t^f Teachers of Math- 
ematics, p. Fi\. 

The term autlicntic cis^cssinciU has been chosen to convey 
two ideas. First, because the word iiutlicntie implies "conforming 
to reality; TRUSTWORTHY" (Webster's AVw C'.oIIc^iiitc Pictio- 
lunw 198“^, p. 1 l*^), assessment of student pertormance should he 
trustworthy indicators of mathematical power [i.e., how well can 
students solve nonroutine prohlemsH. Second, the term has 
been used tor political purposes to imply that ctmventional tests 
are not trustworthy indicators ot mathematical power. They are 
"inauthentic." 

The se\'en chapters in this hook have been prepared to raise a 
set ot issues that scholars are addressing during this period of 
transition from traditional schooling practices toward the retorm 
\ ision ot school mathematics. The primary audience tor this hook 
consists ot rese.irehers in mathematics learning and teaching. It is 
anticipated that university protessors and their graduate students 
will use the volume as a basis tor discussion and potential studies 
diinng the next decxule. However, because ot the growing imj^oi 
tanee ot assessment practices in schools, oiv; would expect that 
manv educational administrators, lesiingdirec .ors, and teaehersot 
mathematics will tind the chapters enlightening. 

\ ii 
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In chaptLM* I , Thomas Rombcr.uand Linda Wilson raise a set ot 
issues that they believe need to he addressed in order to build an 
assessment system for school mathematics. Susanne Laioie pre- 
sents in chapter 2 the ai;i;ument about she need for authentic forms 
of assessment. In chapter d, Edward Silver and Patricia Kenney 
focus on the importance* of assessment iniormation tor makini; 
instructional decisions. Ian dc Laiyi;e illustrates in chapter 4 a 
variety of authentie tasks used in The Netherlands to assess 
different levels of mathematical performance. In chapter S, Robert 
Stake constructs a broad argument on the reasons why standard- 
ized testing is invalid in the current context of reform. Mark 
Wilson, in chapter 6, presents an alternative psychometric ap- 
proach to mathematics assessment. Finally, in chapter Elizabeth 
Crane reflects on the six pre\ ious chapters and points toward the 
need to extend the discussions and broaden our \ ie\\ of possibilities 
that need w be considered. 

A number of people and organizations arc responsible \or 
making the publication ot this Look possible. The writing ot 
iiKlividual chapters and the editing and preparation ot the htiok 
were supported by the Office of Educational Research and Imprtive- 
ment ot the U.S. Department of Edtication through the suppt)it ot 
the National Centei for Research in Mathematical .Sciences Educa- 
tion. The Wisconsin Center tor Education Research provided the 
aneillarv services so necessars' for this tvpe of prtiiect. Andrew 
Poiter, the director ot the Wisconsin Center, and Icrry C'lrossman, 
business manager of the center, are thanked tor their continued 
support, loan Pedro is thanked tor attending to manv ot the 
administrative details involved in its publication. Debra Torgerson 
is thanked tor her work in typing and retvping manuscripts. Final Iv, 
spccitil thanks go to Margaret Powell lor the caretul editing that 
contributed immeasurabh tt' the elan tv and quahtv of the writing 
ot this book. 
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1 ❖ Issues Related to the Development 
of an Authentic Assessment 
System tor School Mathematics 

Thonicis A., Romberg cind 
Linda D. Wilson 



In 1990 the president of the United States and the National 
Governors Association announced their unprecedented agreement 
on national educational goals. For the nation to achieve those goals, 
it has bect)me apparent that the American education system must 
be restructured. The strategy now being followed involves a series 
of steps to produce: a detailed set of content standards in English, 
mathematics, science, history, and geography; a set of standards 
describing how best to instruct studenis toward the attainment of 
each of those content standards; a set of procedures to assess 
student progress in meeting the content standards; and a set of 
standards to describe the responsibility of the professumals who 
will assist students in reaching those standards. The framework for 
restructuring schools via the specification ot content standards, 
teaching standards, performance standards, and their interrelation- 
ships is shown m figure 1.1. 



Content Standards 

^ Instructional Standards 

t 

^ Performance Standards 
\ T asks 



Scoring Procedures 




- NCTM (1989) 

- NCTM (I99i) 



Assessment 

System 



l igurc 1.1. Relationships between content, instructional, and perlor 
mance standards and an assessment system. 
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As show’n in the figure, the initial stages of the framcw’ork 
have been addressed in mathematics. The Cmricuhim and Evalu- 
ation Standards for School Mathematics (National Council of 
Teachers of Mathematics, 1 989) presents a consensual vision of the 
mathematical content that all students should have an opportunity 
to learn — the content standards in figure 1.1. Furthermore, the 
Professional Standards forTeaching Mathemat ics[Nauona[Coun- 
cil of Teachers of Mathematics, 19911 describes the means for 
a-.sisting students to learn that content— the instructional stan- 
dards in figure 1 . 1 . In addition, some of the needed elements for the 
development of an assessment system are described in the Curricu- 
lum and Evaluation Standards. NCTM has now taken on the task 
of producing "assessment standards." They are due to he published 
in 1995. 

The progress made in the content area of mathematics is 
being held up as exemplary for the other four disciplines. Neverthe- 
less, a number of critical issues need to be addressed if the assess- 
ment system is to fit with the vision of the work done so far. In 
addition, a great deal of work remains to be done in the develop- 
ment of performance standards, assessment tasks, scoring, and 
reporting procedures. In this chapter, we identify a number of 
issues that we see as especially significant. Beginning with some 
assumptions upon which the reform movement in mathematics is 
constructed, we discuss what needs to be considered at each stage 
of development of an assessment system lor mathematics for the 
result to be considered "authentic." 



ISSUt 1; UNDERLYING ASSUMPTIONS ABOUT THE 
NATURE OF MATHEMATICS 

An authentic assessment system for school mathematics should 
begin with a vision about the nature of mathematics that is aligned 
w'ith current thinking. An assumption underlying the develop- 
ment of any assessment system is that the responses a student 
makes to a set of test items or tasks will be a valid indicator of that 
student's understanding of some aspect of a domain of mathemat- 
ics. There are three fundamental problems with this assumption. 
First, as Antoine Bodin ( 1 99,1) has argued, one can never know what 
a student truly understands. (')ne can only make inferences based 
on the responses a student makes to the tasks administered. This 
implies that the creation and selection of tasks is critical to the 
assessment process,- in particular, they must reflect important 
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aspects of mathematics a student has had an*opportunity to learn. 
The second problem involves the reliability of the responses to 
those tasks so that a reasonable indicate r of a student's understand- 
ing can be inferred. Together these lead to the final problem: What 
does one mean by an understanding ofmatbematicsl l^he new and 
emerging answer to this question is at the heart of the calls to 
develop an "authentic" assessment system. To clarify this issue, 
we have chosen to describe the classical testing paradigm followed 
in the United States and point out its weaknesses with respect to 
current notions about the nature of mathematics. 

Traditional norm-referenced standardized achievement tests 
for mathematics are created by following a particular measurement 
model. Such tests are made up of an assortment of independent, 
discrete questions that can be responded to quickly; all items are 
assumed to be equivalent; answers (usually derived by choosing 
among alternatives! are judged to be either correct or incorrect; and 
responses should be imcrnally consistent, reflect important varia- 
tions in responses between students, and be fair to all examinees. 
Such tests resolve the three problems mentioned previously by 
selecting or creating items that reflect specific concepts or proce- 
dures that appear in widely used textbooks; carefully considering 
a logical, hierarchical sequence of concepts and procedures,- and 
having a group of teacher and mathematics educators judge their 
face validity. Reliability is established first by eliminating items 
that are too easy, too hard, or do not correlate with other items,- and 
then i\v internal consistency coefficient is calculated. Finally, 
countii ; the number of correct responses on a test constructed in 
this manner is assumed to be a reasonable indicator of a student's 
knowledge, and differences in the number of correct responses 
among students is assumed to reflect differences in knowledge. 

The calls to develop an "authentic" assessment system arc 
based on the conviction that counting the n^imber of correct 
answers to a series of brief questions contradicts current views of 
mathematics as an intellectual discipline. Ernest (19911, for ex- 
ample, argues that mathematics cannot be described by a single 
unique hierarchical structure and that mathematics cannot be 
represented as a set of discrete knowledge components. The math- 
ematician William Thurston (19901 uses the metaphor of a tree to 
describe mathematics: "Mathematics isn't a palm tree, with a 
single long straight trunk covered with scratchy formulas. It's a 
banyan tree, with many interconnected trunks and branches — 
a banyan tree that has grown to the size of a forest, inviting us 
to climb and explore" (p. “^l. A valid system for assessment m 

V? 
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mathematics must reflect these notions — that mathematics isa set 
of rich, interconnected ideas. To be in line with current thinking, 
it must view mathematics as a dynamic, continually expanding 
field of human creation, a cultural product (Ernest, 19881. 

In the NCTM Stnndards (19891, the development of math- 
ematical power is presented as the central goal of school mathemat- 
ics. Matbeniaticcd power is defined as the ability to "explore, 
conjecture, and reason logically, as well as the ability to use a 
variety of mathematical methods effectively to solve nonroutine 
problems" (NCTM, 1989, p. 51. The term is based on a recognition 
that mathematics is more than a static collection of discrete 
concepts and skills to he mastered. Doing mathematics includes 
such dynamic and integrative activities as discovering, exploring, 
conjecturing, sense making, and proving. Students who possess 
mathematical power should he able to investigate and reason, 
communicate ideas, and take real contexts of problems into ac- 
coLini. The descriptive verbs used in the Sumdards evoke images 
oi mathematics as a progressive human activity. 

If one considers mathematics to he a static, linearly ordered set 
ol discrete facts, then the logical choice for a valid assessment system 
is the tradititmal standardized achievement test. On the other hand, 
if line views mathematics as a dynamic set of interconnected, 
humanly constructed ideas, then the assessment system must allow 
students to engage in rich activities that include problem solving, 
reasoning, communications, and making connections. 

ISSUE UNl^EKLVING ASSUME! IONS ABOU! THE 

lEAKNING or MATHEMATICS 

It is critical that in addition to being based on eertain beliefs about 
the nature of mathematics an assessment system he built on 
current views of learning mathematics. A recent study by Shepard 
( 1991 ] showed that approximately half of all district testing direc- 
tors in the United States hold beliefs about the alignment of tests 
with curriculum and teaching that are based on hehaviorist learn- 
ing tlieory, which requires sequential masieryol constituent skills 
and hehaviorally explicit testing of each learning step. Such a 
learning theory was prevalent for several decades, hut is now out ol 
date with current research. 

Indeed, as Romberg, Zarinnia, and Caillis (19901 noted, the 
values and forces that dominated mathematics education lor the 
past century (e.g., hehaviorisml are embedded in the theoretical 
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structures of prevailing methods of assessment. Tests built on 
behavioral objectives and a content hy-process matrix are based on 
behaviorist ideas about learning: that content can be broken down 
into small segments to be mastered by the learner in a linear, 
sequential fashion. 

Yet a substantial body of evidence from cognitive psychology 
shows this hierarchical model of learning to be obsolete. The 
metaphor of the learner as a passive absorber of linearly ordered bits 
of information is contradicted by research findings from psychol- 
ogy. Resnick (1987) has argued that learning does not occur by- 
passive absorption alone, but rather in many situations learners 
approach a new task with prior knowledge, assimilate new infor- 
mation, aiid construct their own meanings. Ernest (1991) has 
shown that the uniqueness of learning hierarchies in mathematics 
is not confirmed theoretically nor empirically. Furthermore, he 
argues against the notion that concepts in mathematics can be 
either or "lacking" in a learner. 

The shift in learningthecuy can best be stimmarized as a move 
trom behaviorism t(^ constructivism. Though there is not total 
agreement in the mathematics education community about ex- 
actly what a "ccMistructivisi" theory of learning entails, Peterson 
(in press) has described four basic assumptions that form the 
foundation for current theory, research, policy, and practice in 
mathematics education: 

■ Learners are knowledgeable "sense makers." 

■ Learning invedves the negotiation oi shared meaning. 

■ Knowing is contextualized or situated. 

■ Assumptions about knowledge influence learning. 

A more appropriate metaphor for learning may he an image 
that is gradually brought into sharper focus as the learner makes 
connections, or perhaps like a mosaic, with specific bits of knowl- 
edge situated within some larger design that is continually being 
reorganized or redesigned in an organic manner. In either case, the 
emphasis is on knowing, rather than "knowing that." The Stun- 
thirds express it as a process: " 'Knowing' mathematics is 'doing' 
mathematics. A person gathers, discovers, or creates knowledge in 
the cotirse of some activity having a purpose. This active purpose 
is difU-rent from mastering eoneepts and procedures. We do not 
assert that informational knowledge lias no value, only that ns 
value lies in the extent to which it is useful in the course of some 
purposeful activitv. It is clear that the fundamental concepts and 
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procedures from some branches of mathematics should he known 
by all students. Established concepts and procedures can be relied 
on as fixed variables in a setting in which other variables may be 
unknown. But instaiction should persistently emphasize 'doing' 
rather than 'knowing that' " [NCTM, 1989, p. 71. Assessment, 
then, should be based on a view of the learning of mathematics as 
a socially constructed process, not a fixed hierarchy of skills and 
concepts to be mastered. 



ISSUE THE NEED FOR NEW PSY CHOMETRIC 
MODELS 

"It is only a slight exaggeration to describe the test theory that 
dominates educational measurement today as the application of 
twentieth century statistics to nineteenth century psychology" 
iMlslevy, 1990, abstract). When Mislevy wrote those words in 
1990 he was calling for the field of psychometrics to "catch up" 
with the advances in cognitive psychology. As noted earlier in 
Shepard's 1 19911 work, roany psychometricians are still operating 
under theories of learning and measurement that are out of date. 
New knowledge of how learning takes place must be accounted for 
in psychometric theory. "Learners become more competent not 
simply by learning more facts and skills, but by reconfiguring their 
knowledge; by 'chunking' information to reduce memory loads; 
and by developing strategies and models that help them discern 
when and how facts and skills are important. Neither classical 
test theory nor item response theory [\RT^ is designed to inform 
educational decisions conceived from this perspective" iMislevy, 
Yamamoto, Anacker, 1992). lust as an assessment system must 
be built on current learning theory, so must the psychometric 
measurement theories that support such a system be designed with 
cognitive psychology as its base. Fortunately, work tow’ard new 
theories ot testing is being accomplished, as some psychometri- 
cians are nenv realizing. Wilson (1992) expressed the need this way: 
"The cmisequence (^f this view of learning [constructivism| is that 
we can nt) longer use an atomistic model tor assessment. We must 
assess the level of complexity of student understanding, not just 
the number of facts that students can pick out of a multiple-choice 
lest " 1 22 V New models, such as those described bv Mark Wilson 

m chapter are being constructed that begin to capture more ot the 
et)mplexitv of le.irning than was allowed for bv standard test 
theorv. Although no new model claims describe all of the 
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nuances of current learning theory, with the support of more 
powerful technologies progress is being made (Glaser, Lesgold, 
Lajoie, 1987; Mislcvy, Yamamoto, 6k Anacker, 19921. It is critical 
that an authentic assessment system take such work into account 
in its design. 



ISSUE 4: ALIGNMENT WITH THE REFORM 
CURRICULUM 

As described in figure 1.1, the first stages of the building of an 
assessment system for mathematics — that is, setting content and 
instructional standards — has been accomplished. Consensus has 
been reached in the mathematics education community about the 
content that all students should be given the opportunity to learn 
and the pertinent means of instruction. As the next four stages 
(setting performance standards, developing tasks, adopting scoring, 
and reporting procedures! are undertaken, it is critical that the 
outcomes be in alignment with the conccptualizati(ms of curricu- 
lum and instruction set forth in the Standards, 

-The Standards are built on a set of assumptions about the 
nature of mathematics, about learning, and about teaching. As 
described earlier, mathematics is viewed as a progressive human 
activity. To know mathematics is to engage in the activities of 
doing mathematics, such as conjecturing, sense making, and 
communicating mathematical arguments. Another fundamental 
assumption is that school mathematics is not solely for the elite, 
but for all students. All students come to school with certain 
mathematical concepts already forming, and the role of the 
teacher is to build on that knowledge so that students gain 
increasing mathematical power. The teacher's role is no longer 
that of a deliverer of knowledge, but that of a guide and facilitator 
for student growth. 

To be in accord with the work completed tluis far on tlie 
mathematics curriculum and methods of instruction, the next 
stages in the development of an assessment system must lake these 
fundamental assumptions into account. This implies, for example, 
that performance standards should be based on students solving 
nonroutine problems rather than performingconventional compu-' 
tat tonal procedures. Assessment tasks should allow studenis the 
opportunity to demonstrate their mathematical power. They should 
require the active engagement of studenis in doing mathematics 
rather than making a passive response to routine questions. Also 
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scoring and reporting methods should be designed to inform indi- 
vidual students about their own learning rather than to rank 
students in groups. 



ISSUE 5: SPECIFICATION OF PERFORMANCE 
STANDARDS 

The NCTM Curriculum unci Evaluation Standarch ( 1 9891 describe 
wl at students should have an opportunity to learn. But to establish 
an assessment system for school mathematics that is aligned with 
that vision, performance standards must be set that will describe 
what students are supposed to be know and be able to do in 
mathematics. Making the connection between curriculum stan- 
dards and performance standards is a difficult task, but one that 
needs to be confronted. 

A conventional approach to testing specifies both content and 
levels of performance; it then crosses them to form a " con tent -by - 
process'' matrix. Although this approach affords test developers the 
assurance that each content area of mathematics and each level of 
performance (or type of process required) will he "covered" by the 
test, the design may also work against the avSsumptions about 
mathematics mentioned earlier. That is, separating mathematics 
into individual cells of "Number" as content and "Computation" 
as process, for example, sets up a situation for test writers to write 
items that fit neatly into those cells. The design does not easily 
allow for items that require more than one content area or more 
than one process in their solution. As an example, consider this 
item, similar to items found on recent standardized achievement 
tests (Romberg tk Wilson, 19921: 



Which IS best to use to find an estunatc for ‘^91 •: 1 9* 

A. 700 10 

IT 700 V 20 

C. m) 10 

D. SOO : 20 

This item w'ould (it (all loo neatly) into a content area of Number 
and a process area of Computation/Estimaiion. When tests are 
composed primarily ol items like these, the underlying assumption 
is that mathematics is a collection of discrete content areas and 
that the doing of mathematics occurs in a separate, compartmen- 
talized, hierarchical fashion. 
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For a task to be considered ''authentic/' it should not easily fit 
into neat categories of single content areas and single processes. 
Solving nonroutine problems usually involves multiple processes 
and cuts across mathematical domains. Making connections nec- 
essarily involves blurring the lines betw’een content and processes. 
The task (NCTM, 1989, p. 14U presented in figure 1.2 illustrates 
the interconnectedness of problem solving, communication, and 
reasoning and involves the content areas of geometry and discrete 
mathematics. 

To build an assessment program in alignment with the NCTM 
Standards implies that all the elements of the program incorporate 
the four major strands of emphasis; problem solving, communica- 
tion, reasoning, and connections. This does not imply that ever>^ task 
would necessarily include all four standards, but that the assessment 
program as a whole would incorporate all four of them at each level. 
It w^ould also be essential that these four strands not be separated into 
distinct "process categories," but rather that the items reflect their 
necessary overlap and interrelatedness. 



I5SUF. (>: DEVELOPING ALTHENTIC TASKS 

Increasing attention is being given to notions of "authentic assess- 
ment." Definitions or criteria for authentic assessment are being 
developed that are built on the framework of the reform curriculum 
in mathematics education. For an assessment system to be consid- 
ered ■'authentic," it must acknowledge these criteria. 

Archbald and Newmann (19881 consider three criteria to be 
critical to authentic assessment tasks: disciplined inquiry,- 

(2^ integration of knowledge; and (31 value beyond evaluation. 



Nine robots are to perform various tasks at fixed positions 
along an as>emhly line. Each must obtain parts from a single 
suppU’ bin to be locatetl at stime point along the line. Investi- 
gate where the bin should he located so that the total distance 
tra\eled hv all of the robots is minimal. 

• • • • • • #- ■ #■ 
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I iguie 1.2. An example a task tor grades 9 1 2 v Rep lod need with 

permission from C.urnculvni iiml [ vahuitinn Suimliird-^ fin 
S(7?of»/ .AIut/ienjuNc’s. eopvright 19S9, hv National C'ouncd 
ot Teachers ot Mathematics, p 14P. 
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Disciplined inquiry refers to the production of new knowledge, 
such as that created by scientists or historians. It depends on prior 
conceptual and procedural knowledge, it develops in-depth under- 
standing of a problem, and it "moves beyond knowledge that has 
been produced by others" (p. 21. Integration of knowledge means 
that authentic tasks must consider the content as a whole, rather 
than as a collection of knowledge fragments. Students must "he 
challenged to understand integrated forms of knowledge" and "he 
involved in the production, not simply the reproduction, of new 
knowledge, because this requires knowledge integration" (p. 31. 
The third criterion, value beyond evaluation, refers to the idea that 
authentic tasks should possess attributes that make them worth- 
while activities beyond their use as evaluative tasks. An example 
would he a task that results in discourse, an object, or a perfor- 
mance. An authentic task might also have value for the collabora- 
tive opportunities it provides. 

Although these criteria are more broadly based, Lajoie in 
chapter 2 develops a set of criteria for authentic assessment specifi- 
cally in mathematics. This framework does not contradict the 
more general notions of Archbald and Newmann, hut makes more 
explicit the ways in which the content of mathematics influences 
the design of assessment tasks. It is built on two primary founda- 
tions; the NCTM Standards (19891 and current learning theory. 
The Standards are predicated on two basic assumptions: the first, 
that knowing mathematics is doing mathematics, and the second, 
that there should he four goals for school mathematics content — 
problem solving, communication, reasoning, and connections. 
From current learning theory, Lajoie [ 199 1 1 chooses situated cogni- 
tion and social constructivism to form a foundation for the defini- 
tion of authentic assessment. 

Building on concepts of mathematics in the Standards and on 
learning theory, Lajoie defines seven principles for a definition of 
authentic assessment: 

1 . It must provide us with multiple indicators of the learning 
of the individual in the cognitive and conative dimensions that 
at feet learning. The cognitive dimensions include content knowl- 
edge, how that knowledge is structured, and how information is 
processed with that knowledge. The conative dimensions should 
address students' interest in and persistence on tasks, as well as 
their beliefs about their ability to perform. 

2. It nuist be relevant, meaningful, and realistic. It must be 
instructionallv relevant, as indieated by its alignment with the 
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NCTM Standards. It must relate to pure and applied tasks that are 
meaningful to students and that provide them with opportunities 
to reflect, organize, model, represent, and argue within and across 
mathematical domains. 

3. It must be accompanied by scoring and scaling procedures 
that are constructed in ways appropriate to the assessment tasks. 

4. It must be evaluated in terms of whether it improves 
instruction, is aligned with the NCTM Standards, and provides 
information on what the student knows. 

5. It must consider racial or ethnic and cultural biases, gender 
issues, and aptitude biases. 

6. It must be an integral part of the classroom. 

7. It must consider ways to differentiate betw’een individual 
and group measures of growth and to provide for ways of assessing 
individual growth wdthin a group activity (pp. 30-31 1. 

This set of criteria, in deHning authentic assessment in mathemat- 
ics, could serve as a guideline for an authentic assessment system tor 
school mathematics. It incorporates current learning theories and 
emphasizes the necessary alignment wnth the reform curriculum. 

The format of authentic tasks may vary'. In fact, in keeping with 
the need for multiple sources of information, no assessment system 
should be limited to a single form. In chapter 4, de Lange discusses 
the various formats of mathematical tasks, from multiple choice to 
portfolios, and sheds some light on what is meant by an ''open" item. 



ISSUE 7: MEASURING STATUS, GROWTH, OR A 
C:OMBINATION 

It IS dear that all forms of assessment, including traditumal 
standardized tests, are designed to measure the present status of 
student thinking. Traditional measuring instruments were created 
to yield highly reliable scores on a single dimensiem, with the 
ultimate purpose of linearly ranking students on that dimension. 
This factory like image of education belongs to an earlier age w'hen 
behaviorist theories of learning held sway. Constructivist ap- 
proaches to assessment require greater emphasis on a developing 
picture of individual growth. The emphasis has shitted from an 
industrial model of quality contrt)l to an etton to describe an 
individual's attainment of mathematical power. There is a need tor 
more than status information; instead of a static score, what is 
needed are profiles ot growth over time. 
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A single score, although useful for ranking students at a fixed 
point in time, places the emphasis of the assessment on the 
measure used. When assessment is seen an a means to understand' 
ing a student's growth over time, the emphasis shifts to the process 
of learning. Such a change vastly increases the utility of the 
information gained for all the audiences involved. Students, no 
longer limited to information focusing only on comparisons with 
peers on a single measure, gain an understanding of their own 
learning. Measures of growth over time are immensely consequen- 
tial for teachers in planning instruction. As students gain more 
sophistication in their problem-solving strategies, assessment can 
best inform students and teachers by describing growth in that 
ability longitudinally. Parents and administrators also gain a deeper 
understanding of student learning when information is provided 
that goes beyond a single static score. 



ISSLil 8: SCX3KING- BY VVH( )M AND IN WHAT 
FORM? 

Scoring on traditional standardized tests has historically been done 
by machine, which w^orks well with multiple-choice, single-an- 
swer items. But different forms of assessment with open-response 
items, for example, require professional judgment to score. The 
issue then is, Can we trust teachers to reliably mark their own 
students' work, and can they he trained to do so; Other countries 
have struggled with this issue as well and have responded with a 
variety of strategies and results. 

In The Netherlands, an experiment was conducted to test the 
reliability of teachers' judgments. Fifteen teachers were asked to 
score the work of five students on an extended open-ended task. 
The teachers were given no information on the students, no 
information on the results of each student's previous work, and no 
indication on how to score the tasks, other than to use a ten-point 
scale. The responses yielded high interrater agreement among the 
fifteen teachers. In 81 percent of the cases, two scores of a student's 
task lay within 1.0 points of each other. When all the averages of 
any two scores and the average of all scores (considered the "correct 
grade"! were calculated, roughly 90 percent ol the averages of any 
two scores lay within a half point of the "correct" grade jde Lange. 
1987, pp. 209-220!, Presumably, teachers given a rubric (or en- 
trusted to develop one! and trained in its use would yield even 
higher rates of agreement. 
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There are models in other countries of statewide mandated 
examinations that rely ra the expertise of classroom teachers for 
scoring, while incorporating strategies for external verification. In 
Victoria, Australia, an external verification process has been used 
for the Certificate of Education exams, which are composed of four 
different kinds of tests, ranging from multiple choice to extended 
projects. The process, which checks the scoring of teachers and 
ensures reliability, has been found to have significant professional 
development benefits for the teachers involved as well. Before the 
examinations are undertaken, there is a training activity at which 
student work from previous years is examined in an effort to bring 
teachers to a common understanding of the desirable attributes of 
student reports and the criteria for assigning grades. After teachers 
have assigned grades, they submit typical examples of student 
work and ditficuh or ungraded cases^o a regional review panel. 
Eventually, examples of work from each region are forwarded to a 
statewide panel for review. The review panels suggest such alter* 
at ions to grades as seem appropriate, and teachers may then 
reassess students' work, taking the panel's advice into account. 

On a final verification day, all students' work is brought to a 
regional meeting. Teachers are divided into verification teams 
under the direction of a review panel member. These teams reas- 
sess a number of reports selected by the panel member. Teachers do 
not reassess work from their own schools. If significant variation is 
toLind between the initial and second grades, further sampling is 
d(/neand a whole class might be re-marked by at least two members 
of the verification team. In the trial s the grades assigned by 
verification teams in this way have been ''remarkably consistent 
because sufficient professional development had taken place to 
ensure a common understanding of the grading process. The grades 
resulting from this process areas comparable as one can reasonably 
expect short of double marking the entire collection of reports" 
(Stephens Money, 1^91, p. ,S1. 

Wilson 1 1 9921 ha^ offered another strategy for combining the 
expertise that teachers have regarding individual students in their 
class with the more tightly ct^ntrolled ratings ot external examin- 
ers. He otters an analytical scheme that combines the two types of 
scores statist icallv in a manner that can give credence to both types 
m 1 as.sessnieni. 

Tlie Victoria example illustrates that such processes, which 
rely on teachers to score the tests with sufficient checks m place, 
can be put into praciiee and tliat tliere are supidementary benefits 
in ilie professional development of teachers. Teachers who meet to 
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discuss the rubrics and expectations for student work can develop a 
common language for the assessment of these tasks, but they can 
also take back to their classrooms a clearer vision of the kinds of 
mathematical activities valued. Teachers who cooperate to create 
and use agreed rubrics for evaluating student work gain valuable 
experience at using alternative forms of assessment. In a context in 
which they arc reasonably sure of the reliability of their judgments 
against those of other teachers, they can in turn feel more confident 
in their own grading for instructional decision making. At the same 
time, formal rubrics and strategies for accomplishing interjudge 
agreement can support the development of alternative assessment 
tasks, prompt efforts to improve the mathematics curriculum to 
enable students to achieve those goals, and help instill public 
confidence in the use of school-based information for accountability. 



ISSUt 9: MAKING REPORTS OF RESULTS 
UNDERSTANDARLF TO THE PUBLIC 

Results of student performance on an examination need to be 
reported to several audiences: students, parents, teachers, admin- 
istrators, and policy makers. The form and substance of these 
reports will necessarily vary according to the audience. Neverthe- 
less, it is essential that they are designed to be easily understood by 
their cons’ .tuents, at the same time that they preserve the richness 
of the information. It would do little good to replace traditional 
tests with nontraditional formats if the conceptually abundant 
information gathered is collapsed into a single numerical score or 
left in an un interpreted form. 

The first question that must be addressed in the design of 
reports is. What kinds of information does each audience (students, 
parents, teachers, adn nistrators, or policy makers) need to make 
informed decisions? Because even the unit of analysis is different 
for each audience, the information required will likewise vary. 
Once that element is decided, the appropriate data sources and 
means o} analysis can be determined. 

One model tor reporting is being developed by Lesh, Lamon, 
Gong, and P(;st (19S)2), Their "learning pn^gress maps" are com- 
puter based, interactive, multidimensional, and decision specific, 
yet relatively simple in design. The maps report student progiess 
along three dimensions. The vertical axis represents the most 
important conceptual models and reasoning patterns that students 
are encouraged to construct at a given grade level. On the horizon- 
tal axis are the I'jasic mathematical strands (such as patterns, 
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quantities); and the depth axis corresponds to the incrcasin^^ struc- 
tural complexity of the underlying conceptual systems. The result 
is a visual image of student learning, in the form of peaks and 
valleys. 

Another proposed framework for reporting student progress 
in mathematics learning (Romberg, 1987) utilizes Vergnaud's no- 
tions about "conceptual fields." The idea is that, rather than 
breaking down mathematics into two dimensions of content and 
processes, a vast number of different forms of problem situations in 
mathematics can be represented by a small number of symbols and 
symbolic statements. For example, the related mathematical con- 
cepts of addition and subtraction of whole numbers has been 
defined by Vergnaud as the conceptual field "additive structure." 
Developing such fields can yield a map of a domain of knowledge. 
Such maps could free test constructors from the bind of filling in 
cells of a matrix. 

A common challenge to current efforts at assessment reform 
is the development of profiles of student learning that are meaning- 
tul, ctincise, valid, and reliable, at the same time that they are based 
on a framework built on current notions of learning mathematics. 
One such attempt is the scheme offered by Zarinnia and Romberg 
(I990i, shown in figure 1.8. The subcategories under Doii\^ Math- 
c unities are not the usual mathematical terms, such as space or 
logic. The idea is to try to capture a more realistic picture of genuine 
mathematical activity. The language of the chosen terms empha- 
sizes mathematics in terms of active engagement, creative reflec- 
tion, and productive effort. 



Representing 

Doing C'ommunieating Maihematiea! iVlaihematical 

Mailiemaiies Maihematies Communilv Dispositii>n 



Locating 

C.'ounttng 

Measuring 

Designing 

Flaying 

Lxplaining 



Mental and 

Represented 

Laeililv in 

C'ommunieating: 

verbally 

visLially 

giaphuallv 

svmbolieally 



Individual 

activity 

Collaborative 

aemin 



Valuing math 
confidence 
Reliefs about the 
maihemaiieal 
enterprise 
Willingness to 
engage .ind 
peisist 



Tigure 1 . L Recommended reporting categories tZarmnia Romberg, 
p. .^P. 
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What each of the three alternatives described briefly here has 
in common is an attempt to respond to current thinking about the 
learning of mathematics. No longer \vill a single numerical score 
suffice to describe the complex processes involved in engaging in 
the kinds of mathematical activity described by the Standards. A 
reporting system that seeks to support, not undermine, an authen- 
tic assessment system will have to be sophisticated enough to 
embrace a more complex view of the learner and an enlightened 
view of what it means to do mathematics at the same time that it 
generates information that is useful to students, teachers, parents, 
administrators, and the public for decision-making purposes. 

The list of issues discussed in this chapter is not meant to be 
exhaustive, nor have we tried to resolve all of the dilemmas of each 
one. The other chapters of this book will elucidate some of them. 
Our intent is to bring to light some of the important matters that 
need to be addressed at each of the stages in building an assessment 
system for mathematics, from the initial assumptions made about 
mathematics and learning to the reporting schemes used. Only 
with careful attention to each stage of the process, and a clear vision 
of the overall framework, can a coherent authentic assessment 
system be constructed. 
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2 ❖ A Framework for Authentic Assessment 
in Mathematics 

Susanne P. Lajoie 



Vast differences exist between the tasks learned in school math- 
ematics and those tasks mathematicians or users of mathematics 
actually carry out (Lampert, 1990; Poliak, 1987; Resnick, 1988). 
How we learn inside the classroom is different from how we learn 
outside of the classroom (Resnick, 1987). Resnick elaborates that 
the focus inside the typical American classroom is on what the 
individual learner can accomplish independent of the group or 
tools for learning such as calculators. In contrast, outside-of- 
classroom learning situations often are group situations, where 
knowledge must be shared and where tools are often available to 
enhance or extend our knowledge. Inside the classroom students 
are taught to manipulate symbols and abstract principles, hut 
outside the classroom learning often is concrete and situated in the 
context in which it will be used. Given the differences between the 
two, the term authentic has been used to suggest that some 
classroom activities are lacking in realism and to conjure up an 
image of an alternative approach. 

Requests for more authentic classroom activities have led to 
requests for authentic forms of assessment. These requests have 
come from several populations — ranging from students, teachers, 
and district and state personnel to a national agenda on the 
integration of instruction and assessment. Although the rhetoric is 
convincing, the images of authentic activities and assessment are 
still imprecise. This chapter is written to stimulate discussion cm 
ways that authentic assessment can he operationally defined in the 
area of school mathematics. 

The distinctions between in-school and out-of-school learn- 
ing have implications for defining authentic forms of assessment. 
In considering these distinctions, we must also consider whether a 
framework for authentic assessment should incorporate aguide for 
authentic instructional activities in the classroom. Should we 
he concerned with mathematical knowledge that transfers to 
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cvcry'day uses of mathematics or should we consider authentic 
mathematics as something mathematicians do in their domain' 
Several researchers have demonstrated that incorporating everyday 
uses of mathemacics into instruction improves both interest and 
performance (Fong, Krantz, ^ Nisbett, 1986; Mosteller, 19881. One 
research strategy has been to identify the learners' informal knowl- 
edge of mathematics (statistics in particular!, which comes from the 
learners' everyday experiences, and formalize this knowledge with 
appropriate instruction (Fong et ah, 1986). It is still an empirical 
question as to what mathematicians do in their domains and how 
such knowledge could be modeled in the traditional classroom. 

Part of what makes mathematics authentic is the fact that 
mathematics is essential to the needs of our rapidly changing 
society. Computer technology has become more the norm in the 
workforce and thus there is a greater need for mathematically 
literate workers. Mathematical literacy can he measured as the 
ability to understand the complexities of technologies, to be able 
to communicate and ask questions, to assimilate unfamiliar 
information, and to work cooperatively in teams. These skills 
are skills for lifelong learning. The fact that our society demands 
job mobility ensures the need for flexibility and adaptability, 
where individuals must bo capable ot learning new information 
quickly and co mm ui eating what they understand or do not 
understand. Adaptation can be fostered by providing multiple 
learning contexts that encourage students to value mathemati- 
cal interpretations in a variety of interrelated experiences. Com- 
munication can be fostered if schooling helps students learn the 
language of mathematics and if schooling provides opportuni- 
ties to conjecture and reason (Lamport, 1990; Resnick, 198 * 7 !. 
The Department of Labor and the business community are 
recommending that we reconsider the type ot criteria that 
certify high school graduates and perhaps include certificates 
that reflect the skills that are required in the workforce (see 
Whetzel, 1992, for a review of the Secretary's Commission on 
Achieving Necessary Skills [SCANS] test). 

My primary focus in defining authentic assessment in math- 
emalics is to provide a robust perspective ot the individual 
learner's understanding of mathematics. Several audiences are 
considered as I define worthwhile mathematical tasks from the 
maihemalies educator's perspective, follow'ed by a description of 
two interrelated theoretical perspectives on authentic activities 
as described in the literature on situated cognition and social 
constructivism. 
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WORTHWHILE MATEiEMATICAL TASKS 

The National Council of Teachers of Mathematics' Standards 
(19891 present goals for worthwhile or essential mathematics that 
are designed to make students at each level of schooling math- 
ematically powerful. The first four of the standards represent 
overarching goals that should be considered for all mathematics 
content at all levels. Any specific mathematical content, according 
to these four, should be designed to provide students with opportu- 
nities for mathematical problem solving, communication, reason- 
ing, and making connections. These goals must he translated into 
tasks that exemplify authenticity. Only then can a framework for 
authentic assessment he developed. 

A definition of an authentic mathematical activity emerges 
from the general assumptions of the NCTM Standards. One 
assumption is that knowing mathematics is doing mathematics. 
Doiu^ H7at hematics relers to gathering and discovering knowledge 
in the course of solving genuine problems where knowledge emerges 
from experiences that are challenging but solvable. One way to 
increase such opportunities is to provide students with experiences 
in building mathematical models, structures, and simulations 
across multiple disciplines. Model building and discovering math- 
ematical patterns is often a dynamic constructive process. Tech- 
nology can be used to facilitate these cognitive processes as well as 
record them. It can be used to assess developmental changes in 
reasoning, hypothesis formulations, verifications, and revisions. 
Technology can also serve as a medium for instructional manipu- 
lation where small changes in the instructional environment may 
account for changes in the learners' acquisition of knowledge. 



PROBLEM SOLVING 

Activities that give students experience with problem solving can 
emerge from problem situations. These situations can be used to 
motivate students and serve as a context in which information is 
learned and knowledge is re-created across grades. For example, the 
use of statistics in everyday situations is apparent in any newspa- 
per. Simply turn to the weather foreeasts for predictions, the sports 
section for rankings, and the business section for percentages. 
Pereira-Mendoca Swill (1981) diseuss ways in which the real- 
world context can he brought to the mathematics classroom by 
using nevvspapei articles that feature statistics as a starling point 
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for discussions to provoke statistical reasoning. Another way to 
make statistics more authentic is to use meaningful data sets as 
opposed to ''cooked up" ones (Singer & Willett, 1990; Tanner, 
1985). By using real data, students come to understand the phe- 
nomenon and see how and why statistics are useful in authentic 
contexts. Although "real" data are often complex, messy, and 
frequently culturally based, their use does provide opportunities 
for multiple strategies and solutions to evolve (Zarinnia ^ Romberg, 
1992). Real-world problems can provide more freedom for learners 
to pursue questions that reflect their personal interests. 

Problem solving with mathematics involves modeling the 
problem and formulating and verifying hypotheses by collecting 
and interpreting data using pattern analysis, graphing, or comput- 
ers and calculators. Technology is a powerful tool; it permits 
learners to manipulate data and see the consequences of their work 
in a few seconds. Some progress is being made in examining the 
effectiveness of graphing calculators and the use of computers in 
facilitating mathematics performance. Wainer's ( 1 992) research on 
graphical displays provides us with insight into mathematical 
problem solving. One insight is that a graph, if properly drawn, can 
facilitate the discovery of relationships in the data and often answer 
both simple and complex questions. Wainer describes how graphs 
can be used to assess different levels of reasoning when paired with 
the appropriate questions. Bertin (1973) developed such a set of 
questions. For example, an elementary question may involve 
finding one data point, whereas an intermediate question could 
involve finding trends among multiple points in the data. The most 
sophisticated question would involve testing the learner's under- 
standing of the deep structure of the data, which might include 
identifying multiple trends or understanding the overall picture. 
Intuitively, assessment of learning through questioning seems 
possible. However, Wainer cautions that questions and graphs 
alone may not result in appropriate assessments of the learner's 
reasoning capability, simply because the graph may be poorly 
constructed and thus misrepresent what the learner can or cannot 
see in the data. 

The use of computers to promote mathematical problem 
solving is becoming more popular. Reed (1985) examined the 
effects of computer graphics on improving estimates in algebra 
word problems. He varied the instructional environment so that 
students either learned by viewing computer simulations or learned 
by doing, where they saw the consequences ot their estimates on 
the computer screen (visual feedback, using computer graphics). 
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He found that simply viewing graphics did not improve perfor- 
mance. Viewing by doing, or utilizing visual feedback, was more 
successful at improving performance. However, the results were 
somewhat mixed and he concluded that certain displays were 
effective for certain tasks and not for others and that learning by 
coaching would be more effective. In other words, a learning-by- 
doing condition with intelligent feedback provided by the com- 
puter in the context of the student-generated estimates might have 
a stronger influence on performance. 

Problem-solving activities need to include those that apply 
mathematics to the real world and those that arise from the 
investigation of mathematical ideas. Traditional curricula have 
emphasized mathematical ideas. The impetus for developing real 
and relevant problems stems from the need to contextualize math- 
ematical concepts in a concrete rather than abstract manner. Real- 
world situations facilitate connectiveness of knowledge, under- 
standing of contexts and goals, and fewer distractions (see chapter 
4, this book). Equity issues must be considered in the development 
of these real-world problems so that cultural biases are not intro- 
duced. In addition to including applied and pure mathematical 
problem types, problem representations should be varied to provide 
for individual differences — that is, verbal, numerical, graphical, 
geometrical, or symbolic — and to permit several ways of reaching 
a solution. 



C()MMUN1CV\T1NG 

Communicating about mathematical ideas permits student^ to 
synthesize information about the ideas. There are a variety of 
modes of communication, including reading, writing, discussion, 
and listening, as well as concrete, pictorial, graphical, and algebraic 
methods. Activities that require students to communicate about 
mathematics provide them with opportunities to retlect on and 
clarity their own thinking and to develop a communal understand- 
ing of mathematical ideas and notations. 

Students need opportunities to present ideas using language to 
ensure that they understand words and their definitions and mean 
ings. Teachers who structure classes to encourage communication 
provide students with opportunities to validate their thinking about 
mathematics. They can foster communication by asking questions, 
posing problems, or asking students to develop problems. Different 
levels of eommunication can be obtained by interviewing individual 
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students, hy using small groups, or by classroom discussions. 
These levels permit students tu ask questions, discuss ideas, offer 
constructive criticism, and summarize discoveries in writing. 
Cultural and gender differences should he considered hy those 
structuring activities to encourage communication. 



REASONING 

Mathematics involves both inductive and deductive reasoning. 
Inductive reasoning is associated with mathematical creativity or 
invention. Deductive reasoning involves understanding the pre- 
mises of a mathematical problem and thinking logically using the 
information given. Challenging problem situations can provide 
opportunities for students to develop mathematical reasoning in a 
variety of contexts. The maturation of mathematical reasoning is 
a long process. Special developmental differences in reasoning, 
especially in grades S-8, where students make the transition from 
concrete to abstract reasoning, must be planned tor. The develop- 
ment of mathematical reasoning could be facilitated in both in- 
structional and assessment settings if appropriate prompts were 
made available to the learners: Why is this true? What it you 
changed this? Do you see a paitei n? 



MAKING CONNECTIONS 

A curriculum that integrates a broad range of mathemaiical topics 
rather than treating each topic in isolation is a connected curricu- 
lum. Number concepts, computation, estimation, functions, alge- 
bra, statistics, probability, geometry, and measurement become 
more useful to students v/hen treated in an integrated fashion. 
Students can b.e helped to make connections between the topics if 
they are provided with contexts that n quire their integration when 
solving problems. Bryant (1984) discusses the interconnections 
between geometry, statistics, and probability. He describes how 
each topic has its own language for expressing meaning by repeat- 
ing examples in different words and emphasizing the equivalence 
t)f the various means of expression that will provide students with 
assistance in drawing the necessary interconnections. It is not 
enough, however, to provide connections among mathematical 
topics; the connection of mathematics with other topics and with 
such disciplines as science, music, and business is also necessary 
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(Bransford et al., 1988; Rosenheck, 19911. Teachers from other 
disciplines can help to identify the mathematical ideas that can be 
explored in their domains. Geography, for example, provides oppor- 
tunities for the use of scaling, proportions, ratio, similarity, and 
other mathematical ideas. Genetics, as a scientific discipline, 
provides ample opportunities for the application of mathematics, 
especially statistics and probability (Ballew, 19811. Ballew de- 
scribes how basic statistical techniques of gatheringand organizing 
data can be used to explain heredity. Using mathematics in specific 
contexts promotes attitudes of inquiry and investigation as well as 
sensitivity to the interrelationships between formal mathematics 
and the real wH)rld. 

Problem solving, communicating, reasoning, and making 
connections can be seen as curriculum goals that permeate the 
entire mathematics curriculum. Specific content areas also need 
to be addressed: number and number relations, number systems 
and number theory, computations and estimation, patterns and 
functions, algebra, statistics, probability, geometry, and measure- 
ment. In reviewing what the NCTM Standards (19891 deem 
worthwhile mathematical activities, it is important to realize 
that a single assessment of such activities will not provide a 
complete picture of a student's intellectual growth. Furthermore, 
different types of assessment are necessary to provide a complete 
picture of the learners' knowledge. In developing new forms of 
assessment, one must determine what methods of assessment are 
best for evaluating various kinds of knowledge. Both individuals 
and small groups should be assessed, but for different skills. 
Small-group learning situations may be useful for measuring the 
ability to talk about and listen to ideas. Individual assessments 
might be better for assessing the learner's ability to synthesize 
knowledge. 

Theories of situated cognition, social constructivism, and 
the influence of the group on an individual's learning can be 
useful in defining authentic activities and authentic assessment. 
Although research on situated cognition is still in its infancy, 
there is evidence that certain activities described by its propo- 
nents are similar to those described as worthwhile by mathematics 
educators. Situated cognition refers to situating learning in the 
context in which one plans to use the knowledge. Problems must 
be realistic or authentic in the sense that the applications of 
knowledge are made apparent to the learner w^hile the learning is 
taking place, rather than outside of the context in which it could 
be used (Creeno, 1989). 
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SITUATED COGNITION 

Situated cognition has developed out of the cognitive apprentice- 
ship model of instruction (CollinS; Brown, ^ Newman, 19891. The 
notion of a cognitive apprenticeship comes from traditional ap- 
prenticeships, where novices learn their trade from a master. The 
masters share their knowledge with novices, assisting them in 
developing a skill or product. Similarly, cognitive apprenticeships 
are designed around the notion that skilled learners can share their 
knowledge with less skilled learners to accomplish cognitive tasks. 
Cognitive apprenticeships, however, must model cognitive pro- 
cesses that are often difficult to externalize so that novices can 
observe or reflect upon the skills for a particular domain. In theory, 
the cognitive apprenticeship models offer suggestions for which 
skills to model for novices, how to provide scaffolding or assistance 
to less skilled learners, and when to fade such assistance when 
learners demonstrate they can construct their own meaning. Since 
the NCTM Standards (19891 call for an integration of instruction 
and assessment, the cognitive apprenticeship model has promise. 
It provides learners with ways to reflect and correct their perfor- 
mance based on assessment feedback. This theory does not provide 
specific guidelines for when and what type of feedback to offer or 
when to drop back on the amount of assistance provided, If this 
theory were used to define authentic assessment in an operational 
way for mathematics knowledge, then such criteria would have to 
be developed. 

Scaffolding or adaptive feedback is important in instruction 
and assessment. Vygotsky (19781 proposed that assessment con- 
sider both an individuals actual development or performance on a 
task without feedback and the potential development or perfor- 
mance on a task with feedback during test taking. In traditional 
assessment, where learners' actual development is assessed, it 
would be difficult to differentiate between two learners who have 
the same score. The two learners could look quite different from 
one another if assessed in situations where limited feedback was 
provided in the test context. Assessment with feedback could 
measure the learners' potential rather than their actual perfor- 
mance. Learners may not need feedback the next time they aie 
tested; thus, the test would become a learning experience in and of 
itself. This is a dynamic and adaptive form of assessment 
(Prederiksen, 1990; Lajoie ck Lesgold, 19921. It is dynamic, because 
learners can be retested; it is adaptive because learners can learn 
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from the test. Dynamic forms of assessment can provide feedback 
to learners, giving them ways to improve themselves and opportu- 
nities to reach their potential. Tests that serve a learning function 
may also improve learners' motivation and sense of self-efficacy. 



SOCIAL CONSTRUCTIVISM 

The cognitive apprenticeship model is similar to the theory of 
social constructivism (Vygotsky, 19781 in that learning occurs 
when one shares knowledge with more capable peers. The NCTM 
Standards (19891 emphasize learners' construction, verification, 
and revision of mathematical models. They also stress the impor- 
tance of fosteringproblcm solving, communicating, reasoning, and 
making connections through small-group or whole classroom 
discussions. Situated cognition and social constructivist theories 
fit the NCTM Standards well. 

Several researchers have examined the construction of math- 
ematical meaning using small groups (Lampert, 1990; Rcsnick, 
1988; Schocnfeld, 198S; Wood, Cobb, Yackel, 19911. The group 
helps facilitate reasoning about mathematics and can also foster 
reflection or use of the mcnacognitive skills necessary to evaluate 
mathematical problems (Schoenfeld, 198S1. Lampert discusses the 
importance of finding a common mathematical language for learn- 
ers to use when communicating ideas. Resnick is particularly clear 
on the necessity of defining a common core of knowledge to 
promote the types of dialogue that Lampert refers to in her work. 
Such dialogues arc an important method for demonstrating that 
mathematical problems may be conducive to multiple, as opposed 
to single, problem representations. Group problem-solving situa- 
tions can provide opportunities for discussion prior to implement- 
ing mathematical procedures (Resnick, 19881. 

The theories reviewed here provide great promise for building 
authentic activities as well as authentic assessments. There is a gap 
in the literature on how to operationalize these theories. It is 
difficult to design groups that will ensure the sharing of cognition 
and optimize learning for each group member. Because more 
capable peers assist the less able learners by articulating their 
cognitive processes, we need to know how to design problem- 
solving situations that allow for the articulation of such processes, 
yet provide opportunities ior the less skilled to participate m the 
overall task. 
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Authentic As^ie^ismcnt 

Authentic assessment must take place in the context of the 
learning process. It must consider both the learner and the situation 
in which the learner is assessed. Authentic assessment must 
provide information on what the learner knows or does not know 
and on the developmental changes in such learning. Repeated 
measures of appropriate learning indicators must be made to obtain 
a robust picture of the learner's knowledge. These indicators must 
include a range of cognitive and conativc abilities so that multiple 
perspectives are available for a particular area (see Snow, 1989, for 
insights regarding the assessment of such learning structures). 

Authentic assessment will require instruments that provide 
in-depth perspectives on learning. Collins, Hawkins, and Frederiksen 
(1991) have begun to address the best tools for obtaining these 
perspectives. They suggest that ont^ "picture" does not mean a 
thousand words when assessing what learners know. At least three 
different assessment mediums, they suggest, ought to he used to 
obtain an integrated picture of the learner. The benefits of such 
mediums as paper and pencil, video, and computers, used jointlv, 
provide a more authentic picture of the learner than a single 
medium. Paper-and-pencil tests, the traditional forms of assess- 
ment, are used to measure students' knowledge of facts, concepts, 
procedures, problem-solving ability, and text comprehension abil- 
ity. Collins et al. ( 1991 ) suggest broader applications of these tests, 
using them, for example, to record how students compose texts and 
documents of various kinds. Students traditionally have been 
assessed on their essays, but other writing tasks such as letters, 
reports, memos, drawings, and graphs can also be used to supple- 
ment compositions. Paper and pencil can also be used to assess how 
well students critique the quality of other documents. 

Video can be used as a medium for assessing students' com- 
munication, explanation, summarization, argumentation, listen- 
ing, and question-asking and answering skills. Video can also be 
used to assess student interactions in the context of cooperative 
problem-solving activities. Video records of dynamic interactions 
can be scored at a later time. They provide opportunities for scoring 
oral presentations, explanations provided in a small group setting, 
and joint problem-solving activities. 

The computer can provide a further perspective on the learner. 
It can ellectively track the process of learning as well as a learner's 
response to feedback. It can also simulate realistic situations in the 
classroom. The computer provides opportunities for assessing the 
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dynamic nature of problem solvingand opportunities to systemati- 
cally vary the instructional environment cm the feedback dimen- 
sion and observe the effects on learning outcomes (Lajoie <15^. Lesgold, 
1992b The feedback dimension offers us a novel mechanism for 
assessing how well or how poorly individuals respond to certain 
learning environments. The ability to track student performance 
provides opportunities lor assessing such strategic aspects of knowl- 
edge as hypothesis formation and hypothesis verification or for 
assessing motivational aspects of learning — how persistent sui-- 
dents are at trying to solve the problem — as well as actual learning 
outcomes. Thus, computers make possible the dynamic assess- 
ment of relevant criteria. Computers can also provide a less struc- 
tured means of assessment, in which students are tracked as they 
explore mathematical content area (Shaughnessy, 1992b 

Collins et al. (19911 suggest that the use of these three 
mediums of assessment will provide a more robust picture of the 
learner. The assessment medium, however, is only as authentic as 
the task that the learner is being tested on. Care must be taken to 
define the types of student records that will be collected and to 
ensure that such records reflect the performance indexes mevst 
relevant to that medium. 

Finally, the purpose or use of assessment must be considered. 
If assessment results are used by the learners or teachers, then the 
assessment tools must be available in the elassroom on a regular 
basis, weaving together instruction and assessment. The interde- 
pendence of instruction and assessment has been ret erred to as a 
systemic approdch (Frederiksen Collins, 1989; Salomon, 19911 
and often used in the context of performance assessment (Baron, 
1990; Linn, Raker, Dunbar, 1991; Wolf, Bixby, Cdeim, Gar- 
dener, 1991b Learners should be able to use the tests as tools to 
reflect on their strengths and weaknesses (Nitko, 1989b Tests or 
assessment tools should be transparent in the sense that those who 
are being assessed understand the criteria on which they are being 
judged so they can improve their performance (Frederiksen ck 
Collins, 19891. Frederiksen and Collins suggest that one way to 
ensure that the assessment criteria are transparent is to provide a 
libiary of exemplars for students to visit. This library provides 
copies of records of student performances that have been critiqued 
by master assessors in terms of the relevant criteria. Such a library 
would hel]^ students evaluate their own perlormance and perhaps 
provide landmarks of success for which to stiive. In addition to sell - 
assessment, feedback should be given to students after a lost is 
taken to help them improve their perlormanee. Teacheis can be 
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assisted in using the assessment tools to determine what concepts 
students have misinterpreted. 



PRINCIPLES FOR OPERATIONALIZING AUTHENTIC 
ASSESSMENT 

We seek to define and operationalize authentic assessment to 
improve learning. Thus, students should find undertaking an 
assessment task a learning experience. And teachers should learn 
what their students know or do not know as a result of the 
assessment task. Some tentative principles for operationally defin- 
ing authentic assessment grow out of the theories and literature 
reviewed; 

1 . It must provide us with multiple indicators of the learning 
of the individual in those cognitive and conative dimensions that 
affect learning. The cognitive dimensions should include content 
knowledge, how that knowledge is structured, and how informa- 
tion is processed with that knowledge. The conative dimensions 
should address students' interest in and persistence on tasks as well 
as their beliefs about their ability to perform. Student interest in a 
topic often increases in conjunction with a deeper conceptual 
knowledge of that topic. Students' choices may reflect their level 
of engagement and interest. These indicators must be examined 
repeatedly if they are to provide us with information on learning 
transitions or developmental maturity. Multiple mediums of as- 
sessment are necessary to provide valid indicators; that is, indica- 
tors that A'c define as authentic. One measure, obtained by a single 
medium, is unlikely to provide us with sufficient information on 
an individual. Varied types of procedures are necessary for gather- 
ing assessment information (Collins ct al., 1991; Romberg, 1993.). 

2. Authentic assessment must be relevant, meaningful, and 
realistic. It must be instructionally relevant, as indicated by its 
alignment with the NCTM Standards (1989). It must relate to pure 
and applied tasks that are meaningful to students and that provide 
them with opportunities to reflect, organize, model, represent, and 
argue within and across mathematical domains. 

.3. It must be accompanied by scoring and scaling procedures 
eonstructed in ways appropriate to the assessment tasks. 

4. It must be evaluated in terms of whether it improves 
instruction, is aligned with the NCTM Standards, and provides 
information on what the student knows. 



3.q 



A FKAMFVVOKK FOR Al THt \ UC ASStSSMtM ❖ 31 



5. It must consider racial or ethnic and cultural biases, gender 
issues, and aptitude biases. 

6. It must be an integral part of the classroom. Because 
teachers are more likely to teach the information to students that 
appears on tests, assessment tasks must be aligned with authentic 
activities such as those outlined in the NCTM Standards. Teach- 
ers need to be an integral part of the assessment loop so that they 
can learn from assessment information and structure their instruc- 
tion accordingly. 

7. It must consider ways to differentiate between individual 
and group measures of growth and provide for ways of assessing 
individual growth within a group activity. 

Alternatives to paper-and-pencil multiple-choice tests do 
exist. Those listed here incorporate several principles of, and hold 
promise as authentic testing forms for, the assessment of math- 
ematics learning: 

Australicin IMPACT Project 

A set of studies was conducted in Australia to facilitate communi- 
cation within the college-level mathematics classroom (Clarke, 
Stephens, Way wood, 1992). Journals were kept by students and 

used by both teachers and students to foster a dialogue about what 
the students were learning. The quality of student journals pro- 
gressed from simple narratives that described concepts, to summa- 
ries that integrated mathematics knowledge, to dialogues regard- 
ing what questions should be addressed and what meaning could be 
constructed as well as the connections of their work with other 
mathematics knowledge. These journals were beneficial to both 
teachers and students because they provided opportunities for 
dialogue not possible during a regular class period. They demon- 
strate that instruction and assessment can be integrated in the 
classroom. Student journals could provide us with new techniques 
for authentically assessing mathematical comm unication skills by 
providing the mechanism for examining transitions in develop- 
mental maturity in these skills. 

Vermont Porll'olios 

Portfolios are promising as an assessment tool because they provide 
multiple examples of the students' work and provide students with 
experience in generating mathematical ideas, seeing mathematics 
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as part of the culture, and being enculturated in the mathematics 
experience, What is particularly intriguing about portfolios is that 
multiple audiences can use them to obtain knowledge of the 
learners, teachers, and curriculum. Guidelines are needed, how- 
ever, on how to score such materials. 

Connecticut Common Core of Learning Project 

The Connecticut Common Core of Learning Project (Baron, 
Forgione, Rindone, Kruglenski, Davey, 19891 provides learners 
with authentic uses of mathematics. Assessment consists of in- 
depth evaluations of learners in the context of problem-solving 
situations that may take a week to complete. This project embodies 
systemic assessment in that instruction and assessment are inte- 
grated. Teachers are provided with assessment tools, in the form of 
scoring templates, that facilitate their task of assessing learning. 
This project provides an example of how to examine both indi- 
vidual and cooperative group problem-solving activities and, in 
doing so, provides insights as to how students form their own 
hypotheses by comparing theirs with other hypotheses and how 
they generalize concepts from one problem situation to another. 

The Californio Assessment Program 

The California Assessment Program (19S91 has addressed the 
concerns of the NCTM Standards (1989) by providing students 
with opportunities to demonstrate their construction of math- 
ematical meaning consistent with their mathematical develop- 
ment . Open-ended questions are provided that give students oppor- 
tunities to think for themselves and express their ideas. Commu- 
nication is fostered in classroom discussions as well as in writing 
tasks. The data from this project provides a wealth of information 
regarding students' misconceptions and reasoning abilities. 

Cognitively Guided Instruction 

In the Cognitively Guided Instruction project (Carpenter, Fennema, 
Peterson, Carey, 1987; Carpenter 8;. Fennema, 19881, instruc- 
tional decisions are based on careful analyses of student knowledge 
and the goals of instruction. Problems are selected that closely 
match the student's knowledge level. The assessment emphasis is 
on the learning processes of students. Individual and group data are 
collected. 
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Problem Si(u<Uions 

De Lange (1987) has designed mathematical problem situations 
composed of multiple items with varied levels of difficulty. In his 
assessment of the Hewet Mathematics Project in The Netherlands, 
five different tasks were used to gather information: a timed 
written task, two-stage tasks, a take-home examination, an essay 
task, and an oral task. These provided a multifaceted evaluation of 
the learner. The two-stage tasks are especially interesting, in light 
of our principles of authentic assessment. Stage one includes open- 
ended questions and essay questions. These items are scored and 
returned to the student. In stage two, students are provided with 
their scores from stage one, allowed to take the stage-one tests 
home, and given as long as three weeks to answer the same 
questions. The final assessment includes scores from stage one and 
stage two. Students can learn from their mistakes and from the 
feedback regarding their mistakes, making the testing process an 
interactive one that assists students in reaching their potentials. 

Suijcritcms 

Superitems are designed to elicit mathematical reasoning about 
mathematical concepts (Collis ^ Romberg, 1991). The items are 
built to assess four different levels of mathematical maturity. At 
level four, the most mature level, the learner must articulate some 
understanding of the mathematical concepts either in words or 
symbols. The tasks can be used to obtain measures of developmen- 
tal reasoning and to serve as a first step in the identification of 
learning transitions in mathematical content areas. 

Many other alternative forms of assessment demonstrate a 
promise of being authentic. For instance, there are software pro- 
grams and multimedia approaches to learning that allow learners to 
explore multiple forms of representations when learning mathe- 
matics (one example is the function analyzer, which provides visual 
representations of mathematical concepts (Harvey, Schwartz, 6^. 
Yerushalmy, 19881). We need to evaluate these new technologies 
along with the alternatives listed previously and consider what other 
options exist for extending our notion of authentic assessment. 

I have laid out a tentative framework for the development of 
authentic forms of assessment. These, and other alternative forms 
of assessment that incorporate new technologies, hold promise for 
fitting within an operational definition of ciuthcntic assessment. 
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Several parts of the framework require additional research. We will 
need to determine how cognitive and conative learning indicators 
can be operationalized in the context of an assessment task. We 
will need to study how to obtain frequent and valid measures of 
learners' performances. And wo will need to define what we are 
assessing in individual and group situations. Finally, when consid- 
ering the multiple audiences that may use measures obtained by 
authentic means, we must keep equity issues in focus. 
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Mathematics education reform is currently a topic of great interest 
in this country, with much of the attention focused on new goals 
for the school mathematics curriculum and instruction. At the 
heart of the reform discussion and fueled by evidence of poor 
student performance in mathematics on national and international 
surveys is a call for raising national standards to a level considered 
to be '"world class." The reform movement and the perceived need 
for national standards of mathematics achievement have led mem- 
bers of the business, government, and education communities to 
focus attention on assessment, especially on the development and 
implementation of a national examination system to monitor 
progress toward the attainment of higher performance standards. 
Far less attention, however, is being paid to discussions about the 
ways in which assessment information — whether from national or 
local tests or from classroom assessments — can be used to guide 
instructional decision making to improve mathematics teaching 
and learning for all students. This topic — assessment for the pur- 
poses of instructional guidance — is the focus of this chapter. 

Education professionals — teachers, school administrators, cur- 
riculum supervisors, counselors, and curriculum developers — rou- 
tinely make decisions that affect the form and content of students' 
instaictional experiences in mathematics. For example, school ad- 
ministrators determine the allocation of resources to support instaic- 
tional programs; teachers and counselors make decisions regarding 
student access to portions of the instructional program, such as 
special classes for exceptional students; and curriculum developers 
and supervisors make decisions that influence the content of instaic- 
tion and, often through staff development, the form in which the 
content will be taught. These macro-level decisions can be and often 
are informed by student assessment information, as are the micro- 
level decisions frequently made in classrooms by teachers. 
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To guide the instructional programs they provide to students, 
classroom teachers make frequent decisions about the differentia- 
tion of instruction, about the inclusion of topics in a lesson 
sequence or homework assignments, about the pacing of the 
coverage of topics, and about the selection of teaching methods. 
Their decisions are influenced by information obtained from for- 
mal and informal assessments of their students. For example, a 
mathematics teacher who plans a unit of instruction on the topic 
of measurement for her seventh-grade students might consider 
information available from a wide variety of sources: 

■ Data from state-level or district-level tests that suggest 
areas of strength or weakness in her students' knowledge of 
measurement; 

■ Information from a brief diagnostic test that could be 
designed and administered before beginning the unit; 

■ Data on her students' performance in prior units on 
related topics (e.g., geometry, rational number computa- 
tion) or on prior units in which students have been 
particularly successful; 

■ Records of previous students' final performance and achieve- 
ment on the measurement unit taught in recent years; 

■ Observations of students' level of engagement with vari- 
ous forms of instructional activities. 

Each of these sources of information could influence deci- 
sions about the content of the unit (e.g., selection of topics and 
worthwhile mathematical tasks) and the method of teaching (e.g., 
modifying a problem-based approach that was particularly success- 
ful in a unit on statistics or adapting forms of discourse that have 
enhanced student learning of other topics), as well as decisions 
about pacing and possible differentiation of instruction. Moreover, 
these and other information sources embedded in ongoing instruc- 
tional activities (e.g., observing students working in small groups, 
talking with students, grading homework, or evaluating extended 
projects) could provide additional information during the teaching 
of the unit that can lead to modifications to the original plans, such 
as adjusting the pacing or selecting additional or alternative math- 
cmatical tasks. 

The measurement unit example showcases many ol the forms 
that assessment can take, highlighting the fact that although 
assessment is often thought of as synonymous with paper-and- 
pencil testing, it actually includes techniques that collect a full 
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range of information about students and the classroom environ- 
ment. The information obtained from this broad array of assess- 
ment activities provides many sources that teachers and others can 
use when making instructional decisions. 

In this chapter, we examine a variety of sources of assessment 
information available to teachers and other educational profession- 
als as they make important instructional decisions. First we con- 
sider the information available from external sources, such as 
standardized achievement tests and international or national as- 
sessment surveys. Then, we turn our attention to many kinds of 
internal, classroom-based assessments that can provide important 
information on which instructional decisions could be based. The 
array of internal assessments to be considered includes not only 
traditional assessments such as classroom tests and homework but 
also periodic observations, questioning, and performance on projects 
and open-ended tasks that provide opportunities for students to 
demonstrate the nature and extent of their understandings and 
proficiencies. 



EXTERNAL ASSESSMENTS AS SOURCES OF 
INFORMATION FOR INSTRUCTIONAL GUILOANCE: 
POSSIBILITIES AND LIMITATIONS 

Using instructional time to administer some kind of externally 
mandated or externally developed assessment is a quite common 
occurrence in most mathematics classrooms in the United States. 
School districts or state departments of education frequently re- 
quire students to take mathematics tests in order to use the test 
results for program evaluation or student placement, diagnosis, or 
summative evaluation.' In addition to these externally mandated 
tests, some schools or districts also volunteer to participate in 
external assessments that have as their main goal monitoring or 
comparison of student proficiency and achievement on a national 
or international level. After these external tests are administered — 
sometimes long after — the results become available to administra- 
tors and classroom teachers, who are then left with the task of 
interpreting and evaluating the information and making instruc- 
tional decisions. 

Given the pervasive presence (T externally mandated tests in 
the lives of mathematics teachers and students, it is important to 
consider ways in which the results of these tests may or may not 
nrovide information that could be in some way useful to classroom 
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teachers or to others for instructional guidance. This section of the 
chapter discusses external assessments of two different types, 
standardized tests and large-scale surveys, and provides some 
examples of ways in which the results of these tests might or might 
not assist mathematics teachers and other education professionals 
in making instructional decisions. 

Standardized Tests 

At some time during the school year every teacher must relinquish 
class time to administer a battery of standardized tests. These tests 
are regularly and widely used to provide a means of measuring 
individual student achievement. The term standardized refers to 
the fact that these tests arc designed to be administered, scored, and 
interpreted in the same way each time they are used. Many 
standardized tests are nationally normed to allow for comparisons 
among students; others are developed to compare individual stu- 
dent achievement on a predetermined set of content objectives. 
Items on standardized tests are typically multiple choice, a format 
that provides for highly efficient measurement and case of scoring. 
Typically, a mathematics test in a commercially published test 
battery (e.g., the California Achievement Test, the Iowa Test of 
Basic Skills, and the Stanford Achievement Test) consists of two or 
three parts: computation, concepts, and applications or problem 
solving. 

Most school districts rely on multiple-choice tests developed 
by commercial publishers to provide mathematics achievement 
information for their students. Many states and some school dis- 
tricts have developed and administer regularly other standardized 
tests that are tied more closely to their curriculum objectives. These 
tests can be used for a variety of purposes: to evaluate the success of 
a school's instructional program in achieving stated objectives, to 
certify student competence for high school graduation, to identify 
students in need of remedial attention, and so on. Because commer- 
cial standardized tests and those developed by states or districts may 
serve differentially as sources of assessment information for instruc- 
tional guidance, they will be treated separately. 

('ommcrc iai Standardized Tests. In general, publishers of com- 
mercial standardized tests declare their purpose to be the improve- 
ment of instruction and learning. For example, one test publisher 
states that "the most important use of achievement test results is 
to help improve student learning through instruction" (vScience 
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Research Associates, 1979, p. 32). Nevertheless, these tests are at 
the heart of the criticism and calls for change in current testing 
practice, and they have been involved in many skirmishes in the 
battle for education reform in the United States.' 

One source of the conflict over the use of commercial stan- 
dardized tests as measures of mathematics achievement is the 
perceived mismatch between the vision of mathematical profi- 
ciency and competence proposed in publications of the National 
Council of Teachers of Mathematics (19891 or the National Re- 
search Council (19891 and the definition implied by the content of 
such tests. In its report on the future of mathematics education in 
the United States, Everybody Counts, the National Research 
Council addressed this mismatch: "As we need standards for 
curricula, so we need standards for assessment. We must ensure 
that tests measure what is of value, not just what is easy to test. If 
we want students to investigate, explore, and discover, assessment 
must not measure just mimicry mathematics" (1989, p. 701. Critics 
have argued that the content of current standardized tests stands in 
opposition to the reform vision of competence and proficiency, in 
which such themes as thinking, reasoning, complex performance, 
and problem solving are emphasized in addition to or in place of 
knowledge and basic skill performance. Commenting on the fail- 
ure of current tests to serve as appropriate symbols of an authentic 
vision of mathematical proficiency. Silver and Kilpatrick (19881 
argued: "Another function of testing is to signal to students, 
teachers, and the general public those aspects of learning that are 
valued. When students ask, 'Is that going to be on the test? ' they are 
inquiring as to the value of the knowledge in question. In general, 
current tests place greater emphasis on those aspects of the curricu- 
lum that arc relatively easy to assess than on those aspects that are 
highly valued by professionals in the field of mathematics educa- 
tion" (p. 1801. 

A closely related criticism of commercial standardized tests 
is the narrow range of curriculum content that typically is covered. 
Deciding the content of these tests is a consensual, market-driven 
process not unlike that associated with creating textbooks (Tyson- 
Bernstein, 1988). To produce a test that is marketable throughout 
the country, test publishers need to ensure that its content at each 
grade level resides at the intersection of the curricular goals of the 
stales publishing such goals. Naturally, this leads to narrowing the 
range of content that might be included at any given grade level, 
thereby resulting in an overemphasis in mathematics on basic 
computational skills and the virtual exclusion of items that mea- 
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sure higher level thinking, reasoning, and problem solving (Califor- 
nia Mathematics Council, 1986; Romberg, Wilson, ^ Khaketla, 
1989). 

In addition to the misalignment of test content and curricular 
goals, another frequent criticism of standardized tests involves the 
excessive use of the multiple-choice answer format. Because of the 
demands of testing large numbers of students in short amounts of 
time, commercial test developers have made almost exclusive use 
of multiple-choice items. This format does not allow questions in 
which students are required to produce their own answers, display 
the processes used to obtain an answer, explain the thinking or 
reasoning associated with their responses, or exhibit alternative 
approaches to or interpretations of a problematic situation. More- 
over, the use of multiple-choice formats is usually associated with 
the imposition of a severe time limitation, which prevents stu- 
dents from displaying their competency under conditions that are 
more felicitous for optimal performance. 

The often fiery criticism of commercial standardized tests 
and other forms of externally mandated testing has been fueled not 
only by these limitations but also by evidence that the widespread 
use of these tests can limit and negatively affect the quality of 
mathematics instruction. Some researchers ( Romberg, Zarinnia, (5k 
Williams, 1989; Smith, 1991) have suggested that teachers are 
influenced by their perceptions of the content of externally man- 
dated tests, especially when the test results are viewed as having 
important consequences for them or their students. In particular, 
the research suggests that teachers tend to narrow their instruction 
by giving a disproportionate amount of their time and attention to 
teaching the specific content most heavily tested, rather than 
leaching underlying concepts or overarching principles, or rather 
than teaching untested or less tested areas (e.g., geometry, statis- 
tics) that are also expected to be part of the curriculum. Other 
studies (e.g., LeMahieu 8^. Leinhardt, 198S) have found that the role 
of the teacher as instructional decision maker is often influenced 
by the perceived content of commercial standardized or other 
externally mandated tests. In this way, many critics charge that the 
widespread use of multiple-choice testing contributes to a 'klumbing 
down" of instruction, in which skills are taught only in the form 
required for the test rather than for more realistic or natural 
applications (Darling-Hammond <5k Wise, 198S). 

Beyond the influence that "teaching to the test" may have in 
shaping instructional practice, it can also diminish the value of the 
information obtained from testing. As Shepard has noted, "The 



44 ❖ SILVER A\l) KEWfY 



more wc focus on raising test scores, the more instruction is 
distorted, and the less credible are the scores themselves" ( 1 989, p. 
9). Rather than serving as accurate indicators of student knowledge 
and performance, the tests become indicators of the amount of 
instructional time and attention paid to the narrow range of skills 
and competencies assessed. 

It should come as no surprise that these criticisms and 
limitations of commercial standardized tests decrease the useful- 
ness of the information obtained from such tests for instructional 
guidance. At best, commercial standardized mathematics achieve- 
ment tests can provide teachers with some general within-student 
information (a student's mathematics achievement in comparison 
with his or her achievement in other subject areas) and across- 
student comparisons of mathematics achievement on the tested 
content. Within a school or district, teachers, administrators, and 
counvselors might use several years of test results as a crude 
indicator of improvement or decline in performance over time. 
Such monitoring might detect gross changes that could be useful in 
directing instructional attention at needed targets. For example, a 
pattern of very low performance on certain suKsections of a test 
could suggest areas that are not receiving adequate amounts of 
instructional attention. Armed with this information, administra- 
tors and teachers could decide whether or not the importance of the 
mathematics content in those test sections warrants a change in 
the allocation of instructional resources. Nevertheless, because of 
the limited scope of the content and item formats on these tests, it 
is unlikely that the information produced by commercial standard- 
ized tests will be very helpful in providing detailed instructional 
guidance for teachers, administrators, or supervisors trying to 
move mathematics programs in the direction of the National 
Council of Teachers of Mathematics' ( 1 989) Curriculum cind Evcilu- 
ation Standards. 

One area in which standardized test information is often used 
for instructional guidance is in student placement. For example, 
the results of commercial standardized tests have long been used to 
determine student eligibility for placement in special programs 
such as grade 8 algebra or Chapter 1 remedial instruction. I n the case 
of Chapter I, the tradition has been to use results of commercial, 
norm-referenced standardized tests as the sole criterion for place- 
ment. Recent changes in tederal laws, however, now allow school 
personnel to use other methods of assessment, such as performance 
tasks and interviews (Stenmark, 1989). These new regulations 
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should allow administrators, counselors, and teachers responsible 
for making such placement decisions to supplement the limited 
information available from commercial standardized tests and to 
use multiple sources oi assessment information to make the best 
instructional decisions for the target students. 

SUte-Level jnd District-Level Tests. In addition to providing 
class time for the administration of commercial -standardized tests, 
an increasing number of teachers are finding that additional time 
must be provided for th«: administration of state-level and district- 
level assessments. In response to actual or perceived demands for 
public accountability, many states and school districts have devel- 
oped their own tests for many purposes, including monitoring 
individual student progress, evaluating program effectiveness, iden- 
tifying students in need of remedial assistance, or certifying compe- 
tence for high school graduation. Because of the difficulty of con- 
structing good assessment measures, these tests have often been 
modeled after commercial standardized tests, with an exclusive use 
of multiple-choice fomiats. Furthermore, commercial test publish- 
ers are frequently called upon to develop state-level and district-level 
achievement tests. Hence, these local tests have a high probability 
of inheriting the limitations of commercial standardized tests. 

State and local standardized tests, however, usually differ from 
commercial standardized tests in one important aspect. In contrast 
to commercial standardized tests, which must be developed without 
reference to any particular set of curriculum objectives, state and 
local tests can be designed to reflect a specific set of subject-matter 
objectives. Because of this ability to relate test items to particular 
objectives, many state and local tests are developed as criterion- 
referenced tests. In contrast to the commercial standardized, norm- 
referenced tests, for which a student's performance is compared to 
that of a national comparison group, student performance on crite- 
rion-referenced slate or local tests can he described in reference to a 
particular objective or set of objectives. 

Although all tests invoKc students responding to a set of 
questions, the analysis and use of the information obtained from 
the testing is somewhat different in the various types of state-level 
and district :Wvel testing programs. In some cases, information is 
gathered to report at a classroom, school, or school district level the 
proficiency of the student population with respect to specific 
objectives. Results from these tests are usually reported in terms of 
the percentage of students (in the class, school, or dislricll who 



46 ❖ SILVER AM) KENNEY 



scored at a specified level. For example, if the passing level for 
measurement concepts is set at the 70 percent level (i.e., a student 
correctly answers 70 percent of the items testing that objective), 
then the results will be reported in terms of the percentage of 
students who achieved that level. In some cases the test may be 
designed so that there are different forms, each of which contains 
only a few of the questions for a particular subject at a particular 
grade level. Because different students may not answer the same or 
comparable questions, it will be impossible to derive from such a 
test information about relative performance at the student level; 
hence, program-level reporting is preferable. 

Results from these program-centered tests can be of value in 
the instructional planning of a classroom teacher, especially be- 
cause the state- or district-developed set of test objectives is likely 
to be closer to those that constitute the curriculum in that indi- 
vidual teacher's classroom. For example, a teacher can observe that 
students in his or her district are performing well on items that 
involve the solution of linear measurement problems, but that 
these same students are not doing as well on items that involve area 
measure. Knowing that area measurement is a concept that may be 
difficult for the students in that classroom, the teacher can consider 
adjusting the pacingof the unit by increasing the time spent on area 
within the measurement unit or adjusting the teaching method to 
one that uses more concrete examples to build a conceptual bridge 
from linear measure to area measure. Curriculum developers and 
mathematics supervisors can also luse information from state- and 
district-level tests to influence curriculum revision anr staff devel- 
opment activities. 

For some state- or district-level tests, results arc reported by 
individual student and used to make basic educational and instruc- 
tional decisions, usually related to students whose scores do or do 
not exceed some predetermined "cut score." For example, mini- 
mum-competency tests are often used for placement into remedial 
instruction programs; passinga "graduation test" often determines 
whether a student earns a high school diploma; some states even 
have tests that a student must pass to be promoted to the next grade 
(Airasian, 1991 ). Results from these tests obviously provide infor- 
mation upon which instructional decisions are made — students 
arc assigned to remedial programs or required to "repeat" a course 
of instruction. At the classroom level, the tests may also provide 
some information lor the teacher. For example, knowing an 
individual student's strengths and weaknesses on a minimum- 
competency test could provide information a teacher could use to 
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assign extra work or adjust instructional approaches or assign a 
student to a cooperative learning group. Aggregated information 
about group performance on certain topic areas may also provide 
some information that could inform decisions about the allocation 
of time to different units of instruction or the pacing of lessons. 
However, it is important to remember that the value of the 
information to educational professionals who are trying to move 
classroom instruction in the direction of mathematics education 
reform depends heavily on the relationship between the reform 
vision and the actual test content and format. If the test content 
reflects a narrow conception of mathematics or if the test format 
samples narrowly from the wide range of mathematical perfor- 
mances, it is unlikely that such a test can provide useful informa- 
tion for instructional guidance. 

The mismatch of test content and a specific vision of math- 
ematical proficiency becomes even more problematic because of 
the ways in which test scores are often used, it is not at all 
uncommon for the scores to be used as the basis for comparisons 
among schools. In fact, lists of schools and their average student 
performance are often published on the pages of the local newspa- 
per, thereby inviting comparisons between high-scoring schools 
and low-scoring schools, without regard for the many other factors 
that would have to be taken into account in making valid compari- 
sons. An even more dangerous, and unfair, practice is the use of 
such test scores for the evaluation of individual teachers within 
schools, a use for which the scores were never intended. As a means 
of obtaining information for use in making instructional decisions, 
these practices are dangerous because they can lead administrators 
to pressure teachers to shape their instruction toward the test 
content and test format, as discussed earlier in the context of 
commercial standardized tests. The words of one teacher sum up 
these concerns: "My principal puts a great deal of emphasis on our 
school's performance on the state-mandated basic skills test. He's 
very concerned about how we do compared to neighboring schools 
when the results are published in the local paper. The emphasis is 
to include in instruction topics on the test but [that are] not yet in 
our curriculum and to give tested topics more instructional time" 
(Airasian, 1991, p. 3611. 

For teachers and other decision makers in states or districts 
that use tests based on appropriate mathematics content and that 
report and use scores fairly, these tests can provide some informa- 
tion for the purposes of instructional guidance. Looking beyond 
stale- and disu ict-level boundaries, mathematics teacheis may 
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also gain some helpful information from national and international 
surveys of students' mathematical achievement. 

National and International Assessments 

Recent proclamations (e.g., Goals 2000] concerning the dire state 
of students' math unatics achievement in the United States and the 
need for massive unprovement to reach "world class standards" of 
mathematics performance have been stimulated by data obtained 
from national and international surveys of mathematical knowl- 
edge. Much of the attention given to this issue has focused on the 
need to create national standards for mathematics proficiency and 
a national testing system to measure students' attainments. If and 
when a national testing system is designed and implemented, the 
data generated from its implementation may be useful for instruc- 
tional guidance in some ways, especially if the assessment tasks 
used in the test embody the reform vision of mathematical profi- 
ciency. At this time, however, education professionals seeking to 
use such information for instructional guidance are limited to 
data available from several existing surveys of mathematics 
knowledge. Although these surveys have had some impact in 
mobilizing public concern about mathematics achievement, the 
remainder of this section deals with ways in which information 
generated in such surveys might be useful to those making in- 
structional decisions. 

International Surveys, Beyond sounding alarms because of the 
poor performance of U.S. students compared to their counterparts 
in other countries — alarms that some critics (e.g., Husen, 1983; 
Rotberg, 1990, 1991) think may be inappropriate and unneces- 
sary — international comparisons offer useful benchmarks against 
which to gauge the performance of students in this country. The 
information available from international assessment surveys is 
certainly interesting, yet its direct relevance to the daily instruc- 
tional decisions made by classroom mathematics teachers is prob- 
ably minimal. Due to design constraints of an assessment that has 
to consider students at a variety of grade levels from many nations, 
the information may be far removed from the content or focus of 
any particular teacher's instructional program. 

Despite these limitations, and whether or not their own 
students participate directly m the testing, teachers and other 
instructional decision makers can get a global view of .'Student 
knowledge and performance in mathematics and may be able to 
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glean some information useful in instructional guidance from 
these international performance surveys. For example, reports of 
the results of international surveys such as the Second Interna- 
tional Mathematics Study (SIMS) (McKnight ct al., 1987) and the 
International Assessment of Educational Progress (lAEP) (Lapointe, 
Mead, & Phillips, 1989) contain sample items and further informa- 
tion and commentary that might help educators address instruc- 
tional issues related to student achievement in various content 
areas (e.g., algebra, geometry, probability, and statistics) and in 
specific ability areas (e.g., problem solving). 

In the lAEP, sample items were linked to levels of mathemat- 
ics proficiency, as shown in figure 3.1. One way in which middle 
school mathematics teachers might use the lAEP proficiency levels 
and sample items would involve having students respond to the 
sample items and then looking at student performance with re- 
spect to five proficiency levels. Another option might be for a group 
of teachers to meet and examine the content categories included in 
international assessments^ and then compare them to content 
categories that form the local mathematics curriculum. 

Beyond these opportunities for teachers to use specific results 
or performance summaries, it is also possible that teachers or other 
instructional decision makers would be able to use information 
related to more general trends or explanations that emerge from 
such surveys. For example, based on a review of the results from an 
international survey, a mathematics supervisor might make an 
impact on instruction by planning staff development sessions on 
certain relevant findings, or a curriculum developer might modify 
some exciting curricular units. Moreover, based on consideration 
of the general findings from SIMS to the effect that the mathemat- 
ics curriculum in tne United States lacked focus and depth 
(McKnight et ah, 1987), district or state mathematics supervisors 
might engage teachers in efforts aimed at curriculum development 
or topical rearrangement to address the problem. 

Ncitiont}! Assessments. The National Assessment of Educa- 
tional Progress (N AEP) is a general source of mathematics achieve- 
ment information that may have direct relevance to classroom 
teachers, curriculum developers, mathematics supervisors, and 
other mathematics instruction decision makers in the United 
States. There have been six such national assessments in math- 
ematics, with the results from the last NAEP released in April 
1993. The purpose of the NAEP mathematics assessment is to 
provide a general picture of what students '"know and can do"' in 
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Figure .^.1 . Proficiency levels and sample items from the 1988 International Assessment of Edncational 
Progress (lAEP). (Lapointe, Mead, & Phillips, 1989. Used with permission.) 
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mathematics. The official reports from N AEP (e.g., Dossey, Mullis, 
Lindquist, & Chambers, 1988; Mullis, Dossey, Owen, 8^ Phillips, 
1991, 1993) summarized trends in mathematics proficiency using 
a representative national sample of students, currently from grades 
4, 8, and 1 2 with respect to mathematical processes or abilities (e.g., 
conceptual understanding, problem solving) and mathematical 
content (e.g., numbers and operations, geometry) that are common 
foci in the school mathematics curriculum. In addition to the 
process- and content-based results, the NAEP reports contained 
examples of items that appeared on the instrument, information on 
programs and practices in mathematics instruction, and results 
from questionnaires on students' attitudes toward mathematics 
and teachers' mathematics classroom practices. 

Like the international assessments, results from the NAEP 
mathematics assc.ssments are not likely, because of the generality 
of the survey, to be directly helpful in providing detailed guidance 
to teachers making instructional decisions in their classrooms. 
Nevertheless, the general reports of results may provide a fairly 
good source of information for teachers, mathematics supervisors, 
and curriculum developers. More extensive reports of NAEP re- 
sults, including further analyses of student performance and a 
greater variety of sample items, are often undertaken by NCTM 
task forces. For example, for the 1 986 mathematics assessment, the 
NCTM published a book. Results from the Fourth Mathematics 
Asse<imient of the National Assessment of Educational Progress 
(Lindquist, 19891, and a series of articles written for teachers, which 
appeared in Arithmetic Teacher and Mathematics Teacher (e.g., 
Brown et al., 1988; Kouba et al., 1988). 

These latter reports and articles are much more specific about 
test items and possible interpretations of students' performance 
than the general NAEP publications. By presenting sample items, 
together with information about the percent of students who 
attempted an item and answered it correctly, these secondary 
analysis reports may provide more helpful information for instruc- 
tional planning. For example, consider the graphical interpretation 
and interpolatic.n item shown in figure 3.2. In the official report of 
the 1986 NAEP mathematics results (Dossey et al., 1988), details 
regarding student performance on this item are not given, and 
performance is discussed only generally with respect to a predeter- 
mintd proficiency level of 300 that involves the ability to "demon- 
strate more sophisticated numerical reasoning, and ... to draw 
from a wider range of mathematical skill areas" (p. 39). In contrast, 
the secondary analyj^ (if this item (Brown Silver, 1989) reported 
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Refer to the following graph. This graph shows how far a 
typical car travels after the brakes are applied. 




A car is traveling 55 miles per hour, Al)out how far will it 
travel after applying the brakes^ 

□ 25 feet 

□ 200 feet 

□ 240 feet 

□ 350 feet 

□ 1 don't know 

Figure 3.2. Graphical interpretation item from the Fourth Mathematics 
Assessment ot the National Assessment of Educational 
Progress (NAEPl. 



that 41 percent of the grade 7 students and 70 percent of the grade 
1 1 students answered it correctly. Using this kind of information, 
some teachers might choose to administer this or other "released" 
NAEP items to their students and then compare their class results 
to results from the national sample. Additionally, by comparing 
the percentage of correct figures across grade levels, teachers and 
other instructional planners and designers can gauge the effects of 
exposure to the mathematics curriculum and student growth in 
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mathematics with respect to certain content areas and mathemati- 
cal abilities and identify curricular or process areas that may need 
greater instructional emphasis. 

In the near future, it is likely that the N/\EP will continue to 
be a major source of national assessment information. In fact, 
recent changes in the NAEP have increased the likelihood that its 
information will be considered as a source for instructional guid- 
ance. By instituting a state-by-state reporting system, and by 
classifying student achievement both with respect to proficiency 
scales (Mullis, 1 990) and achievement levels (National Assessment 
Governing Board, 1991), NAEP is moving toward a more visible test 
and one for which the performance stakes are inching higher. 
As the country moves toward a national testing system, it is likely 
that additional sources of national test information will become 
available. 

As national testing becomes more visible and more impor- 
tant, greater pressure will almost certainly be exerted on educators 
at the state, school district, school building, and classroom levels 
to use the results as the basis for instructional decisions. With these 
national data being discussed as sources for instructional decisions, 
it will be important to examine the relationship between test 
content and task format and the standards for mathematics cur- 
riculum and assessment outlined in the NCTM Standards and in 
other mathematics education reform documents. An analysis of 
the relationship between the NCTM Standards and the content of 
the grade 8 NAEP mathematics assessment revealed that only 
about half of the test items were related to the NCTM themes of 
problem solving, reasoning, and communication (Silver, Kenney, 
^ Salmon-Cox, 1992). Moreover, the analysis also showed that 
only one item was related to the NCTM Standards theme of 
mathematical connections, that most of the test items were mul- 
tiple-choice rather than constructed-response tasks, and that the 
items involving calculator usage were unimaginative. Certainly, 
one would want to consider these features of the NAEP test and be 
cautious in making instructional decisions and reccniimendations 
based on student performance on this test. 

Summary 

Despite the presumption that externally mandated test ing should 
he useful in providing information to improve instruction and 
learning, experience and research suggests that this is not often the 
result. In fact, our analysis of these external assessments as sources 
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of information for instructional guidance has indicated the limita- 
tions of these tests for providing such information. Whether from 
international, national, state, or local testing, the results of exter- 
nal assessments offer only limited information on which to base 
instructional decisions. Because these assessments arc far removed 
from the classroom environment, their results are of minimal 
utility for detailed instructional guidance, especially for the kinds 
of interactive decision making (Borko Shavelson, 1990) that 
characterize teaching on a daily basis. When suitable at all, the kind 
of information available from these assessments appears to be best 
suited for general, long-range planning at the school, district, or 
state levels. Nevertheless, to the extent that these tests also fail to 
include broad coverage of rich mathematical content and assess the 
use of only short-answer and multiple-choice formats, the informa- 
tion will not be very useful to educational professionals who arc 
seeking to reform mathematics instruction. 

In the next section, we turn our attention to richer sources of 
information for interactive decision making and instructional 
guidance. These sources of information exist in the instructional 
activities of a mathematics classroom in which students are en- 
gaged in the performance of substantive, authentic activities and 
are reached through use of instructionally embedded assessments 
that can provide a portrait of students' mathematical proficiencies 
and competencies — a portrait that becoriics evident through obser- 
vation and evaluation of students as they engage in the perfor- 
mance of the activities. 

Assessment Information Embedded in 

Instructional Practice: Sources 

and Resources 

As we have seen, externally mandated tests impinge upon the time 
available for mathematics classroom instruction, yet the student 
performance results arc quite limited as guides for instructional 
decisions. In fact, some have argued that the tests are largely 
superfluous, confusing rather than enhancing teachers' judgments 
and evaluations. As Hill (1991) has noted: "The teacher is closest 
to student performance, observes it daily, and assesses it constantly 
to make instructional decisions. It is doubtful that any [external] 
measure can tell us a traction of what a teacher ferrets out in the 
process of instruction" (p. 4). 

Earlier in this chapter, we discussed the widespread belief that 
externally mandated testing has negative effects on classroom 
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instruction. In light of that belief, it is somewhat surprising that 
surveys of teachers and students have consistently indicated that 
they believe the educational and psychological effects of classroom 
evaluation are generally substantially greater than the correspond- 
ing effects of standardized testing (Dorr-Brcmme & Herman, 1 986; 
Haertel, 1986; Kellaghan, Madaus, & Airasian, 1982; Salmon-Cox, 
1981; Stiggins & Bridgeford, 19851. Although externally mandated 
testing may have a focused, short-term effect on classroom instruc- 
tion, apparently teachers and students see the cumulative effects of 
ongoing classroom evaluation as having a greater impact on the 
learning that does or does not occur and the feelings of satisfaction 
that do or do not result. 

Research has shown that a wide range of evaluative activities 
takes place in classrooms, with different patterns at different grade 
levels and in different subject areas (Fennessy, 1982; Gullickson, 
1986; Stiggins & Bridgeford, 19851. Activities include evaluation 
through teacher questioning and class or group discussion, mark- 
ing or commenting on performances of various kinds, checklists, 
informal observation of learning activities, written exercises of 
various kinds (including projects, assignments, worksheets, and 
text-embedded questionsl, and teacher-made tests. Although tests 
and testlike activities constitute only a fairly small component of 
the total set of evaluation activities in a course, the impact of 
classroom testing has been studied more extensively than other 
forms of classroom evaluation. Some studies [Dorr-Bremme 8^. 
Herman, 1986; Haertel, 19861 have estimated that formal tests 
occupy about 5 percent of students' time at the elementary school 
level and about 15 percent at the secondary school level. Math- 
ematics and science teachers, however, have tended to rely more 
heavily on paper-and-pencil objective tests, whereas teachers in 
other subjects are reported to rely more on structured obser\^ations 
and professional judgments (Stiggins 8;. Bridgeford, 19851 

Teachers judge evaluative activities to be important aspects 
of teaching and learning, but they are often concerned about the 
perceived inadequacy of their efforts [Gullickson, 1986; Stiggins 8< 
Bridgeford, 19851. With respect to the evaluation of mathematical 
problem solving. Silver and Kilpatrick (19881 argued that attempts 
to control the content and form of teachers' instruction have had 
the consequence of deskilling teachers by ''convincing them that 
they lack the expertise to assess how their students are learning and 
thinking" (p. 1851. 

It is unlikely, however, that abetter situation will result trom 
a requirement that teachers make an effort to learn more about the 
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theory of educational measurement. Although it is true that a 
substantial proportion of teachers, especially elementary school 
teachers, have little or no formal training in educational measure- 
ment techniques, it is equally true that many of those who do have 
such training find it of little relevance to classroom evaluation 
activities (Gullickson, 1986; Gullickson Ellwein, 1985; Haertel, 
1986; Stiggins, 1985). Approaches that tie assessment practices to 
instructional goals and activities in reasonable ways are likely to be 
more productive. As Silver and Kilpatrick have noted: ''What is 
needed are serious efforts to re-skill teachers, to provide them with 
not only the tools such as sample problems and scori ng procedures 
that they can use to construct their own assessment instruments 
but also with the confidence they so often lack in their own ability 
to determine what and how their students are doing in solving 
mathematical problems" (p. 185). 

The linking of assessment practices to instructional goals 
seems especially important when we consider the research findings 
on the content of teacher-made tests and contrast that with the 
current thinking about the important goals for school mathematics 
instruction. Most analyses of the content of teacher-made tests 
have found that the vast preponderance of questions require low- 
level knowledge and performance on the part of the students. For 
example, Fleming and Chambers 1 19831 analyzed 8,800 test ques- 
tions in twelve grade and subject area combinations (elementary to 
high school) and found that almost 80 percent of all questions were 
at the "knowledge" level in Bloom's taxonomy. Even in classrooms 
in which teachers reported instructional goals involving higher 
level thinking. Haertel ( 19861 found that "classroom examinations 
often failed to reflect teachers' stated instructional objectives, 
frequently requiring little more than repetition of material pre- 
sented in the textbook or class, or solution of problems much like 
those encountered during instruction" (p. 21. 

To some extent the limitations in the form of classroom 
testing may he attributed to the influence of externally mandated 
tests. Mathematics teachers often create or use multiple-choice 
and short-answer tests, thereby demanding and evaluating perfor- 
mances from their students only in forms identical to those used on 
standardized tests. At the elementary school level, especially, 
many teachers make extensive use of commercially prepared tests 
that accompany their textbooks. Naturally, these commerciallv 
prepared tests neither reflect the instructional nuances of any 
particular teacher's class nor utilize a rich variety of task formats. 
Even when textbook tests are not used, teachers often emulate the 
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“tests that really count" and utilize multiple-choice and short- 
answer formats in their classroom assessment (Fleming Cham- 
bers, 1983; Stiggins Bridgeford, 1985!. 

Other reasons have been suggested to account for the poor 
quality of the content represented in classroom teachers' tests, 
including the difficulty of writing tasks (especially short-answer or 
multiple-choice test items! to assess higher level skills (Elton, 19821, 
the ease with which teachers can defend their grading of students' 
responses to lower level factual recall items and the resulting higher 
reliability [Natriello, 1 987), and the belief that higher level questions 
may lead to confusion and frustration on the part of students (Doyle, 
1986). Although teachers may have good reasons for orienting their 
tests primarily tow'ard lower level knowledge and performance, a 
contlict is likely to exist between these kinds of assessment practices 
and the objectives of a mathematics curriculum oriented toward 
higher level thinking, reasoning, and problem solving. 

As mathematics classrooms move toward the realization of 
the vision portrayed by the NCTM Curriculum und Evaluation 
Standards for School Mathematics (1989) and the Professional 
Standards for Teaching Mathematics (1991), they will become 
environments in which teachers and students work together on 
making mathematics and on the active exploration of mathemati- 
cal ideas. As the teaching of mathematics shifts “from an authori- 
tarian model based on 'transmission of knowledge' to a student- 
centered practice featuring 'stimulation of learning' " (National 
Research Council, 1989, p. 81 ), mathematics programs will involve 
students in a wide variety of activities, such as 
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■ working collaborativelv; 

■ asking and answering questions posed by fellow students 
or the teacher; 

■ engaging in substantial discussions about mathematics; 

■ thinking hard about what they are learning and about the 
nature of mathematics; 

■ working on extended proiecis that may take days, or even 
weeks, to complete; 

■ solving worthwhile and challenging problems on teacher- 
made matliematics tests and on homework assignments. 

These and other activities have the potential to tunction as 
instructionally embedded sources ol assessment information 
that can be used for instiuctional guidance as well as tor 
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summative evaluation of students' achievement. The next section 
of the chapter discusses suggested sources of instructionally em- 
bedded assessment opportunities within the venues of classroom 
discourse and activities and the direct performance of mathe- 
matical tasks, as well as the kinds of information that these 
classroom assessments can provide for teachers and other decision 
makers. 



ASSESSMENT EMBEDDED IN CLASSROOM 
DISCOURSE AND ACTIVITY 
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The NCTM Professional Standards for Teaching Mathematics 
1 1 99 1 1 places a heavy emphasis on the role of discourse in facilitat- 
ing students' learning of mathematical ideas: "The discourse of a 
classroom — the ways of representing, thinking, talking, agreeing 
and disagreeing — is central to what students learn about math- 
ematics as a domain of human inquiry. . . . Students must talk, 
with one another as well as in response to the teacher. When the 
teacher talks most, the flow of ideas and knowledge is primarily 
from teacher to student. When students make public conjectures 
and reason with others about mathematics, ideas and knowledge 
are developed collaboratively, revealing mathematics as constructed 
by human beings within an intellectual community. Writing is 
another component of the discourse" (p. 34). In classrooms that are 
moving toward an embodiment of the NCTM Standards, the 
classroom discourse — between teacher and student and among 
students — is centered around worthwhile mathematical tasks, and 
the intellectual activity of the students provides a rich environ- 
ment from which assessment information can be obtained. A 
teacher can gain valuable information for instructional guidance by 
watching students as they work on mathematical tasks, by observ- 
ing students working in pairs or in groups, by asking appropriate 
questions at opportune moments, and by listening to students 
present their answers or solutions, their approaches or methods, 
and their explanations or justifications. 

Although the activity and the discourse can certainly serve as 
a rich source for teachers' interactive instructional decisions, 
because discourse ami intellectual "clivitv in the classroom are 
ephemeral entities, much of theassessment information that could 
be gained is likely to go unrecorded and therefore remain unavail- 
able for instructional planning and long-range decision making. In 
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this section we give some attention not only to a few forms in 
which assessment information might be extracted from naturally 
occurring classroom discourse and intellectual activity but also to 
some ways of preserving records of the discourse and student 
activity in order to facilitate instructional planning. 

Observation, Watching students while they are "doing math- 
ematics" can provide insights into their understandings and 
misunderstandings. Observation is almost certainly the most 
basic classroom process of gathering assessment information 
about students. Information gained from careful observation is 
regularly used by teachers to decide whether to move forward in 
a lesson or give more time for completion of a component activity, 
to decide whether to provide an additional example or a different 
explanation, and to modify the expected direction of a lesson or a 
unit of instruction. 

Although the number of students that can be observed at any 
one time is limited, observation is an assessment method that is 
generally comfortable and convenient for classroom teachers be- 
cause it is relatively easy to include as part of regular classroom 
routines, i\r\d it is useful for assessing a range of student character- 
istics, including performance, attitudes, and beliefs. Although 
record keeping can be cumbersome without advanced planning, 
recording schemes tor systematic observation are relatively easy to 
construct, and many sources of ideas for observation instruments 
for the mathematics classroom exist (e.g., British Columbia Min- 
istry of Education, 1990; Charles, Lester, ^ O'Daffer, 1987; 
Stenmark, 1989). 

Among the many types of observation instruments that exist, 
two types are particularly appropriate for use by the classroom 
teacher: the annotated class list and the topical list. These instru- 
ments are easy both to construct and administer, but their purposes 
differ somewhat. The annotated class list consists of a roster of 
student names with a space to the right of each name for recording 
a variety of student attributes such as mathematical understanding 
(or misunderstanding), demonstrated attitudes toward mathemat- 
ics, and potential areas in which the student excels or needs 
assistance. The topical list consists of a set of predetermined 
categories to be used during the ob.>ervation. For example, during 
an opportunity to observe students solving problems, a teacher 
might choose these categories as the focus for her observation of 
students: 
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■ tries to understand the problem, 

■ selects appropriate solution strategics, 

■ shows a willingness to switch between solution strategics, 

■ uses a systematic procedure, 

■ shows perseverance, 

m checks work and answer. 

Teachers can use information from documented classroom 
observations in a variety of ways to assist in instructional decision 
making. In an unobtrusive way teachers can watch and listen to 
students as they explain their mathematical thinking and work in 
groups, thus gaining a feel for the students' facility with commu- 
nication. In some cases a student might demonstrate more under- 
standing than that indicated on a written test or on homework. 
Teachers can get a sense of how students have processed and 
interpreted information about the area of mathematics under 
consideration — or how they have not processed or misinterpreted 
the information presented. Taking time to observe students at 
work, then, can prove opportunities for timely feedback that can 
shape decisions to be made regarding instruction in the next class 
period and beyond. 

Questioning. It is difficult to imagine a mathematics class- 
room without questioning as a central activity, and it is difficult to 
imagine instructional decisions that are not informed in some way 
by students' responses to questions. Questioning — both planned 
and unplanned — can be a source of useful information for student 
assessment related to students' cognitive performance, attitudes, 
beliefs, mathematical insights, and metacognitive processes,- and 
students' responses to questions can provide a valuable source of 
information for interactive instructional decisions involving ad- 
justment of lesson pacing, example selection, homework assign- 
ments, and so on, as well as for longer term instructional planning 
decisions. 

The idea of gaining assessment inlormation through ques- 
tioning students is far from novel. An extensive literature exists on 
classroom teacher questioning practices. Carlsen (1991) presents 
an excellent review of these studies with respect to the context and 
content of teacher questions and the responses of teachers ana 
students to questi( ns. Much of this work has been done from the 
perspective of the process-product pa adigm (Mitzel, I960; 
Rosenshine Furst, 1973), in which studem outcomes (usually. 
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though not exclusively, student achievement) are viewed as a 
function of discrete, observable teacher behaviors, or from the 
sociolinguistic perspective (Cazden, 1986; Green, 1983), in which 
teacher questions are viewed as a mutual construction of teachers 
and students, rather than being exclusively the result of teacher 
generation; and the research focus has been on the linguistic 
character of the communication structures and the social dimen- 
sions of the questioning, such as its role in reflecting and reinforc- 
ing authority and social status relationships in the classroom. 

Although they appear to be designed to enhance student 
discourse, teacher questions may sometimes discourage students 
from speaking. For example, Dillon's (1985) analysis of five class- 
rooms showed that teacher questions typically produced terse, 
factual statements by students, whereas noninterrogative expres- 
sions produced lengthier, more syntactically complex responses. He 
concluded that teacher questions in these classrooms had the conse- 
quence of suppressing rather than enhancing student discussion. 

Given the new vision for mathematics instruction in which 
teachers are to pose questions that "elicit, engage, and challenge 
each student's thinking" (NCTM, 1991, p. 35), discourse-inhibit- 
ing questioning such as the kind described by Dillon (1985) is 
unacceptable. Recent publications (e.g., Bennett ^ Foreman, 1990; 
Stenmark, 1989) have included sections on teacher questioning in 
the mathematics classroom and sample questions in the areas of 
problem comprehension (What is this problem about?), relation- 
ships (Is there a pattern?), communication (Could you explain what 
you think you know about this concept right now?), and self- 
assessment (What kind of mathematics problems are still difficult 
for you?). Questions such as these would appear to have promise in 
enhancing student- teacher discourse, thereby providing teachers 
with important information for instructional guidance. 

Although direct questioning of students can certainly be a 
source of information for instructional decision making, its level of 
importance may change as the mathematics classroom environ- 
ment evolves to meet the NCTM Curriculum and Evaluation 
Standards and the Professional Standards for Tcacliin^ Math- 
enuitics. As teachers become facilitators rather than interrogators 
and as teachers begin to use other kinds of mathematics tasks to 
elicit students' higher order thinking skills, the use of direct 
questioning of students at the classroom level will likely diminish. 

One practice in which teachers' use of direct questioning is 
likely to remain important, however, is that of individual student 
interviews. Structured or semi-structured interviews, in which a 
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preselected problem situation and a set of probing questions are 
used, have long been used by researchers to study students' math- 
ematical performance and the extent of their understanding of 
mathematical concepts and procedures (e,g,, Erlwanger, 1973), 
Classroom teachers can also use this form of assessment to collect 
and record detailed information about students' mathematical 
understanding and problem-solving processes. As Peck, Jencks, 
and Connell (1989) have suggested, "lust as student interviews 
have been helpful in uncovering conceptual difficulties, they can 
be a useful tool for guiding the progress and direction of day-to-day 
classroom work" (p, IS). 

The key to a successful interview assessment is a well- 
designed interview plan. Although such plans may vary depend- 
ing upon the problem situation presented, they usually are com- 
posed of six steps; establishing rapport, presenting specific in- 
structions, presenting the problem, probing for understanding of 
the problem, probing for the solution process, and coming to 
closure (British Columbia Ministry of Education, 1990; Charles, 
Lester, ^O'Daffer, 1987K During the course of the interview, two 
principles of importance are acknowledged: sufficient time should 
be allowed for the student to formulate a response, and the 
student's thought processes should be of greater importance than 
the answer. 

Whether the questioning situation occurs informally as part 
of regular classroom af tivity or in a more structured individual or 
small group setting, teachers can gain insights into students' 
thinking and communication skills that may not be obviously 
apparent from written work. Questioning and interviews, as forms 
of oral discourse, can also provide diagnostic information so that 
teachers may direct instruction to review concepts that pr(jve to be 
problematic or to review briefly those areas in which students have 
demonstrated understanding, 

lVn7/(‘n Discourse, As the earlier quote from the NCTM 
Profcssiomi} Siandarcls for Teachiny: Mathcnuiiics indicated, not 
all discourse is oral; various forms of written discourse can also 
serve well as sources of information for instructicinal guidance. For 
example, student journals, or students' written responses to spe- 
cific probes eoncerning their learning or their disposition, provide 
opportunities to consider students' ideas or theii attitudes and 
beliefs, when making instructional planning decisions. 

Rose (19891 suggests that journal writing in mathematics 
classHMims ean be instrumental in setting up a useful dialogue 
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between students and their teachers: "Stuclems and teachers find 
something to talk about and the classroom becomes more coopera- 
tive and humanized as each see [sic] the other in a new and 
personalized light" (p. 25). The function, content, and format of the 
journal depends upon its intended use as part of classroom assess- 
ment. For some purposes, journals might be used as a repository for 
students writing in an expressive mode, in which students "think 
aloud on paper" and record their impressions of mathematics 
classroom activity or their learning. In other cases, students might 
be asked to use their journals for transactional writing, in which a 
journal entry serves as a record of specific responses to a teacher's 
questions or provides some designated kinds of information. In 
either case, the journal serves not only as a record of the students' 
thoughts but also as a medium for dialogue with the teacher. A 
powerful dialogue between student and teacher can occur during 
the typical journal-writing sequence: student entries followed by 
the teacher reading, reflecting, and commenting on the entries, 
followed by new student entries, and so on. As teachers use the 
journals to understand the thinking, feelings, or recorded observa- 
tions of their students, they can apply this information when 
making instructional decisions. 

For a variety of reasons, but certainly because journals are 
perceived both to use valuable classroom instructional time 
and to require additional reading and commentar>' from teachers, 
relatively few mathematics teachers use journal writing as a maior 
instructional activity in their classrooms. Nevertheless, some 
form of written discourse is possible in all mathematics class- 
rooms, and the frequency of occurrence is likely to increase in the 
next decade. As an alternative to full-scale journal writing, teachers 
might regularly have students respond in writing to specific prob- 
ing questions, or to complete "sentence starters" iStenmark, 1989) 
such as "Today in mathematics I learned ..." or "Of the math 
we've done lately, !'m most confused about ..." Clearly, students' 
responses could serve as a valuable source of information to guide 
planning for the following lesson or week's work, and if students 
were asked to respond to such probes on a regular basis, this 
information would in all likelihoou become an important compo- 
nent of instructional decision making and could provide teachers 
with important information about how students are thinking and 
feeling about their mathematics classroom experiences. 

Another . ::ample of a writing task that provides assessment 
information ftti mstructional guidance is one that might be di- 




ASSFSSMCM i\K;KMATI()\ FOK 1\S1 KLC.TIONAL CAADA^CZ ❖ 65 



reeled towarda specific mathematical concept. In Thinking Through 
Mathematics, Silver, Kilpatrick, and Schlesinger (1990) present an 
account of a teacher who, before beginning a unit on geometry, used 
a simple statement ("Tell me everything you know about circles") 
as a means to find out how much her tenth graders remembered 
about this geometric entity. Students' knowledge v^as found to vary 
from remembering simple facts ("A circle is round"), to remember- 
ing formulas [A = Trr’), to more sophisticated ideas ("It's made up of 
a series of arcs that are all connected"). This simple exercise provided 
the teacher with information that allowed her to structure the lesson 
on circles, to identify students who might need additional help on 
this topic, and when she asked the same question again at the end of 
the unit, to identify how the students' understanding of circles had 
changed as a result of instruction. Thus, by giving students a few 
minutes to respond to a simple question or statement at an oppor- 
tune time, teachers may gain important information that can en- 
hance their instructional decision making. 

Research has shown that teachers consider both cognitive and 
noncognitive information in making their instructional planning 
decisions (c.g., Shavelson & Stern, 19811. Through observation, 
conversation, and reading students' journals and other writings, 
teachers may be able indirectly to gain valuable information about 
students' dispositions toward mathematics and toward themselves 
as learners of mathematics. This information can also be gained 
more directly through the use of attitude or belief surveys, ex- 
amples of which appear in many recent publications (c.g., British 
Columbia Ministry of Education, 1990; Charles, Lester, &LO'L')affer, 
1 987; Mull is et al., 1 99 1 ; Nicholls, Cobh, Yackcl, Wood, ik Wheatley, 
1990; Stenmark, 1989). Ry having students write responses to a set 
of statements (c.g., "in mathematics, memorizing is more impor- 
tant than creative thinking"; "! will keep working on a problem 
until I get a right answer") in a simple "yes or no" format or with 
a more complex scale (Strongly Disagree, Disagree, Neutral, Agree, 
Strongly Agree), or by having students respemd directly to ques- 
tions or statements (e.g., "What is the biggest worry affecting your 
work in mathematics class at the moment?" "In mathematics class 
I like . . . because . . . "), teachers can view the development of 
students' thinking and altitudes from the students' perspective. 
Moreover, responding to attitude and heliet surveys can. foster in 
students a tendency to engage in self-assessment, a proactive, 
internal process that promotes the development of mathematical 
power (Kenney ^ Silver, 199S). 
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Information from Direct A^isesroient of Performance on 

Mathematical Tasks 

Although the prior section has dealt with observation and 
discourse that occurs around and about mathematical tasks 
being used in the classroom, the discussion thus far has focused 
more on the nature of the discourse itself and less on the tasks 
and the criteria for judging performance. In this section, we 
consider the instructional guidance information provided by 
various types of mathematical classroom tasks and the associ- 
ated judgment criteria. 

Projects and Ofjen-Ended Problems. In contrast to the short > 
answerer multiple-choice questions that typically make up math- 
ematics achievement tests, opportunities foi students to engage in 
extended exploration of mathematical ideas and situations are 
provided by projects, investigations, and open-ended problems. As 
such, they also provide opportunities for teachers to assess their 
students' abilities to formulate problems, to apply their knowledge 
in novel ways, to generate interesting solution approaches, and to 
sustain intellectual activity for an extended period of time. More- 
over, by working on an extended investigation of an interesting 
mathematical problem, students participate in activities that are 
closely related to the nature of complex performances outside of 
school, through which they can develop an understanding that the 
analysis of complex problems may take days or even weeks to 
explore; they can learn to work independently or collaborativcly on 
a large project; and they can experience the process of producing a 
written or oral report of their work over an extended period of time. 
Open exploratory tasks have been frequently used in mathematics 
instruction in other countries — for exam|»le, Australia (Claik':, 
1988) and Great Britain — hut they are infrequently used in the 
United States (Dossey et al., 1988; Mullis et al., 19911. As ideas 
central to mathematics education reform become more prevalent 
in niathematics teaching, one would expect their use to increase. 

An example drawn from one use of mathematical investiga- 
tions in Australia may illustrate the value of this kind of task in 
mathematics teaching. Stephens and Money (19^)81 reported the 
use of investigation and open-ended problems by the Vict(»ria 
Curriculum and Assessment Board as part ol its external examina- 
tion of students for high school credit in their mathematics courses. 
One component of the examination is an investigative project 
representing fifteen to twenty hours of student work, and another 
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component is a challenging problem chosen from a set of four 
problems and requiring about six to eight hours of student work. 
For our purposes here, what is interesting about this example is the 
fact that such problems are viewed by this external examination 
board as providing important information about students' math- 
ematical attainments and that these novel tasks are administered 
by classroom teachers, who are required to allow students to work 
on the investigative project for seven to ten class periods and who 
then score the students' work using grading criteria organized 
around three aspects of the work: 

■ Problem definition (clear definition of requirements, as- 
sumptions, variables, and identification of the nature of the 
solution being sought); 

■ Solution and justification (production of a solution, appro- 
priate use of mathematical language; accuracy, interpreta- 
tion of results; depth of analysis; quality of justification of 
solution); 

■ Solution process (relevance of mathematics used; genera- 
tion and analysis of appropriate information; recognition of 
relevance of embedded findings, refinement of problem 
definition). 

The list of criteria suggests the kinds of information that 
teachers could obtain from the use of such tasks as part of math- 
ematics instruction. In fact, according to Stephens and Money 
(1993), the teachers involved with the external examination pro- 
gram use the assessment criteria and the related grade descriptors 
as a basis for planning their teaching program. One can easily 
imagine that teachers in the United States could use similar criteria 
to guide instruction, even if the tasks v^erc not part of an external 
examination program but rather were generated and implemented 
by mathematics teachers. 

Especially, though not exclusively, for younger students, it 
may be desirable to have the results of projects and open-ended 
problem investigations reported orally as well as, or rather than, in 
writing. The combination of oral and written presentations would 
allow for a more thorough evaluation of students' mathematical 
performances, and similar criteria could be used to evaluate oral 
and written presentations. In this way, even elementary school 
teachers would be able to obtain assessment information about 
students' mathematical thinking that could help guide instruc- 
tional decisions. 
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The publication of a number of sources of open-ended prob- 
lems and projects (e.g , Shell Centre for Mathematics Education, 
1984; Souviney, Britt, Gargiulo, & Hughes, 1990; Trowell, 19901, 
and some published discussions of the pedagogical rationale for 
activities such as open-ended problems (e.g., Silver & Adams, 1 987; 
Silver, Kilpatrick, Schlesinger, 1990; Silver & Mamona, 19901 
provide mathematics teachers with a base from which to include 
these kinds of activities in instruction. Once included as a regular 
feature of mathematics teaching, the activities should become a 
valuable asset in obtaining student performance information to 
guide instruction. 

Cfasaroom Testing. Written tests are commonly used in class- 
rooms to assess each individual student's achievement in math- 
ematics. As noted earlier, research has shown that formal class- 
room testing occupies a substantial portion of instructional time (5 
to 15 percentl. In addition to their role in providing summative 
inform.ation on student achievement, classroom tests also repre- 
sent a major source of formative feedback that might be useful in 
guiding instructional decisions. Unfortunately, their value as in- 
structional guides is limited if, as has been indicated in the research 
discussed earlier, the tests constructed by mathematics teachers 
tend to make heavy use of short-answer and multiple-choice 
formats. Unfortunately, an excessive emphasis on short-answer 
questions creates the dual impression that only the final answer 
matters and that what is valued in mathematics is the ability to 
answer many questions quickly, almost certainly as a result of 
having memorized many facts and procedures that can be recalled 
rapidly and applied flawlessly. However, to the extent that teachers 
and other instructional decision makers come to view mathemati- 
cal activity in a manner consistent with reform documents like the 
NCTM Curriculum Standards and the Professional Standards for 
Teaching, Mathematics, as a process primarily involving reflective 
reasoning, problem solving, and communication about mathemati- 
cal ideas, it is clear that classroom testing will need to include 
formats other than questions requiring only short answers or the 
choice of one answer f*om a set of options. 

Teachers mterested in diversifying their classioom testing 
might include a project or an open-ended problem, such as that 
discussed previously, as a "take-home" portion of a test. The 
inclusion of such an activity as a portion of a test would provide a 
teacher with information about aspects of student performance 
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that could not be made available solely from classroom testing. 
Even within the time constraints characteristic of classroom test- 
ing, however, it is possible to include tasks that can provide 
information on students' reasoning, problem solving, and commu- 
nication. This can be accomplished through the use of mathemati- 
cal tasks that can be completed in 5 to 15 minutes, in contrast to 
the 30 to 43 seconds typically available for a response to a multiple- 
choice question, and that bear a "family resemblance" to projects 
and extended open-ended problems that require much longer to 
complete. Such tasks are being used in external testing programs in 
mathematics such as the College Board Advanced Placement (API 
Test in calculus and in some state-level testing programs (e.g., 
California, Connecticut, Kansas, Maine). The California Assess- 
ment Program (CAP) has used these kinds of problems in its 
mathematics assessment for several years; it has published ex- 
amples of the tasks, the scoring guides, and sample student re- 
sponses in A Question of Thinking (California State Department of 
Education, 1989) and A Sampler of Mathematics Assessment 
(Pandey, 1991). 

An example of this kind of task drawn from the assessment 
developed for the QUASAR project^ may illustrate its value as a 
source of information for instructional guidance. A sample QUA- 
SAR assessment task appears in figure 3.3. A sample answer to 
this task might be "No" with an explanation such as "Yvonne 
takes the bus eight times a week, and this would cost S8.00. 
Because the bus pass costs S9.00, she should not buy the pass." It 
is possible, however, that a student might answer "Yes" and 
provide a logical reason, such as "Yvonne should buy the bus pass 
because she rides the bus eight times for work and this costs S8.00. 
If she rides the bus on weekends to go shopping, it would cost 
$2.00 or more, and this would be more than $9.00, so she can save 
money with the bus pass." 

When these tasks are used as part of external testing pro- 
grams, student responses are typically scored holistically, using a 
scoring guide that provides detailed information about various 
levels of performance in solving a particular problem. In QUASAR, 
for example, students' responses are scored using a scoring guide 
(rubriel that attends to three categories of solution characteristics 
\Silver <!:n Lane, 199.T: 



■ Mathematical knowledge (knowledge of relevant concepts, 
procedures and principles; identifying relationships among 
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The table below shows the cost for different bus fares 



BUSY BUS COMPANY FARES 



One Way $1 .00 

Weekly Pass $9.00 



Yvonne js trying to decide whether she should buy a weekly bus pass. 

On Monday. Wednesday, and Friday, she rides the bus to and from work. 

On Tuesday and Thursday, she rides the bus to work but gets a ride home with 
her friends. 

Should Yvonne buy a weekly bus pass? 

Explain your answer- 



Figure .VvV Sample QUASAR task. 




problem elements; identifying and executing appropriate 
procedures; verifying results; integration of mathematical 
ideas!, 

■ Strategic knowledge [appropriate use of mathematical 
models, including diagrams and symbols; use of appropri- 
ate problem-solving strategies,- systematic application ot 
strategies!, 

■ Communication (appropriate expression of mathematical 
ideas in words, mathematical symbols, or pictorial repre- 
sentations,- reasonable use of vocabulary, mathematical 
notation and structure to represent ideas; quality of justi- 
fication of a solution!. 




These categories of evaluation criteria illustrate the kind of 
information that can be obtained both for the summaiive evalua- 
tion ol students' achievement and Itir the purposes of insimeiional 
guuianee. 

Tasks Irom programs such as CAP and QUASAR can provide 
examples that mathematics teachers can use or modity fen- other 
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grade levels. Teachers who use open-ended tasks as part of formal 
assessment can derive a wealth of information from student re- 
sponses. As we have seen, they are able to learn whether or not their 
students can recognize the main points of a problem, organize 
information, interpret results, use appropriate mathematical lan- 
guage, and express their own thinking and reasoning processes 
(Stenmark, 198S). However, reading lengthy responses to these 
tasks is far different from checking multiple-choice responses or 
scoring the more typical computational responses to mathematics 
problems. Moreover, the development of detailed scoring rubrics 
that set forth requirements for varying levels of performance 
may be essential to ensure high degrees of interrater agreement 
when the tasks are used on external assessments,- however, 
classroom teachers are unlikely to have the time to create such 
detailed scoring guides. For classroom use, scoring rubrics need 
not be as elaborate or detailed as those used in external testing 
programs, but they can still provide teachers with a mechanism 
for evaluating students' solutions and examining the evidence 
provided in the students' response to detect clues that might 
help guide instructional decision making. For example, a 
teacher might focus on strategy selection and use, and then 
use that information to plan additional instruction or to choose 
examples for the next unit. If such tasks were employed on a 
regular basis, the accumulated information about students' strat- 
egy selection and use, for example, might suggest curricular or 
instructional changes that could be implemented in subsequent 
years. 

Homework ind Other Assignments. In addition to their use on 
classroom tests, the kinds of tasks discussed i n the previous section 
can serve as excellent in-class or homework assignments. In what- 
ever capacity they are used, they can provide information that 
teachers can use to plan instruction. To the extent that homework 
and in-class assignments can be used to provide students with a 
wide variety of mathematical experiences — experiences that em- 
phasize mathematical problem solving, reasoning, and communi- 
cation — the information gathered by teachers through an examina- 
tion of students' performance can enrich instructional decision 
making. If the homework and in-classassignments are restricted to 
"blue ditto sheets" or "more-ol-the-same" exercises to be solved 
using well-rehearsed procedures, the information obtained will be 
of very limited value for guiding instruction in the direction 
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suggested by documents like the NCTM Professional Standards 
for Teaching Mathematics. 

Accumulating Instructional Guidance Information 

In our discussion we have noted several times that much of the 
information obtained from instructionally embedded assessment 
can be especially useful for instructional guidance if it is accumu- 
lated over time. Careful record keeping can help to ensure that 
longitudinal information is accumulated for examination. A tech- 
nique that has been suggested as particularly appropriate for accu- 
mulating instructionally embedded assessment information is the 
mathematics portfolio. In its most general sense, a portfolio is 
container of evidence of someone's knowledge, skills and disposi- 
tions" (Collins, 1990, p. 159). Creation of a collection of work 
produced over time has long been an accepted form of assessment 
in the arts and humanities, and it has received some attention in 
mathematics in recent years (Mumme, 19901. In general, attention 
to portfolios has focused on the use of this technique as an 
alternative or supplement to formal testing as a means for evaluat- 
ing student achievement. For our purposes here, it is more impor- 
tant to focus on portfolios as a source of useful information for 
instructional guidance. 

Although portfolios could conceivably be assembled for a 
variety of purposes, the two purposes most often discussed are to 
display the "best work" or demonstrate "growth over time." The 
purpose of the portfolio determines the criteria that will be used in 
selecting its contents. If the portfolio is meant as a summative 
display of proficiency, then the samples representing a student's 
best work are most appropriate for inclusion. In contrast, if the 
purpose is for documentation of growth and progress over time, 
then it would be desirable for it to contain dated examples of 
student work, including drafts and final copies of projects, solution 
attempts (both successful and unsuccessful) for a particular prob- 
lem, and perhaps some examples of the use of concepts or proce- 
dures early in a course as contrasted with their use later in the 
course. 

Most discussions of classroom use of portfolios emphasize the 
benefits of involving students actively in the selection of items to 
include in their portfolios. In this way, they can engage in an 
important process of self-assessment and the portfolio can be 
lurther personalized, such as when a student who is interested in 
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art decides to include examples of pictorial representations of 
and solutions for mathematics problems. The following 
(nonexhaustivel list compiled by Stenmark (19891 suggests the 
wide range of contents that might be included in a mathematics 
portfolio: 

■ Written descriptions of the results of practical or math- 
ematical investigations, pictures and dictated reports from 
younger students; 

■ extended analyses of problem situations and investigations; 

■ descriptions and diagrams of problem-solving processes; 

■ statistical studies and graphic representations; 

■ reports of investigations of major mathematical ideas; 

■ responses to open-ended questions or homework problems; 

■ group reports and photographs of student projects; 

■ video, audio, and computer-generated examples of student 
work. 

For younger children, or for students who are compiling a 
portfolio for the first time, the teacher might maintain the 
portfolio for each student and periodically review it with the 
student until the student becomes more familiar with the 
process i Collins, 1990i. 

The states of California and Vermont have taken an active 
role in the development of portfolio assessment and have each 
published examples of student work as well as scoring guidelines. 
The California Assessment Program investigated the use of two 
categories f(^r scoring individual portfidios — evidence of math- 
ematical thinking (e.g., organization of data, conjecturing, explor- 
ing) and the quality of activities and investigations (c.g., evidence 
of significant investigations, connections between mathematical 
content areas) — and one category for evaluating the portfolios from 
an entire class based on the variety of approaches and invesuga- 
tiems used across the set of portfolios (Mumme, 1990; Pandey, 
1991), In its report of results from the 1990-1991 pilot project, the 
state of Vermont ^Vermont Department of Education, 1991) pub- 
lished the scoring guide for the "best pieces" component of the 
poitlolio project along with sample student responses at some of 
the score levels. The scoring criteria were based on two elements- 
problem solving and mathematical communication — and on these 
cmena within each element: 
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H Problem solving (understanding the task, quality of ap- 
proaches and procedures, decisions along the way, out- 
comes of activities); 

■ Mathematical communication (language of mathematics, 
mathematical representations, clarity of presentation). 

For the classroom mathematics teacher the portfolio pro- 
vides a comprehensive view of students' mathematical experi- 
ences over time that combines the advantages of other forms of 
instructionally embedded assessment. Virtually all of the class- 
room assessment information sources that have been discussed in 
this section — student journals, written responses to open-ended 
problems, summaries of group projects or independent investiga- 
tions, homework papers— can be included in a portfolio. The 
coupling of portfolios with information from classroom discourse 
and activity (e.g., observations and interviews) constitutes a 
multifaceted approach to evaluation of individual student achieve- 
ment and a vast reservoir of information for making instructional 
decisions. 

Although information from portfolios and other sources can 
be directly beneficial to classroom teachers, it is less clear how this 
information can be transmitted beyond the classroom door and 
into the hands of other education professionals who also engage in 
instructional decision making. However, there are some ways in 
which other teachers, curriculum developers, and mathematics 
specialists can benefit from instructionally embedded assessment 
information. For example, an eighth-grade mathematics teacher 
could receive the portfolios that her students compiled last year in 
theirseventh-grade mathematics classes, fiy looking over students' 
prior work in mathematics, the new teacher could make prelimi- 
nary diagnostic decisions regarding strengths and weaknesses on 
previous mathematics content and, perhaps, make some prelimi- 
nary predictions regarding areas in which students may experience 
success or difficulty. Student portfolios might also he useful to 
curriculum developers and mathematics supervisors. For example, 
a curriculum development committee could devote a series of 
meetings to studying a sample of student portfolios at a particular 
grade level to determine the extent of content coverage. Mathemat- 
ics supervisors could examine a set ot portfolios on the basis of the 
diversity of activities included, and they could plan staff develop- 
ment activities directed at potentially interesting sources that 
might not have been included (e.g., projects, lournals). 
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TOWARD A VISION OF INSTRUCTIONALLY 
GUIDED ASSESSMENT AND 
ASSESSMENT-GUIDED INSTRUCTION 

In this chapter we have tried to view assessment, whether originat- 
ing outside or inside the classroom, as an important source of 
information for instrcxtional decision making in mathematics. In 
our review, we have pointed out the generally limited value of 
externally mandated tests in providing instructional guidance 
information for classroom teachers, but we have also noted some 
ways in which the information from such testing might be useful 
at a more global level to other instructional decision makers, such 
as administrators, curriculum developers, or supervisors. On the 
other hand, we have seen that a rich instructional program in 
mathematics has embedded within it a great deal of useful assess- 
ment opportunities and information for classroom teachers, yet it 
is more difficult to see how this instructionally embedded assess- 
ment information could be shared with others outside the class- 
room.. A fundamental challenge, then, is to blend the best of both 
types of assessment to develop approaches that allow the display of 
a wide array of assessment information ihat can be helpful both to 
classroom tcr»chers and other instructional decision makers. 

In our consideration of external assessments, we noted that 
these tests often have little to offer in the way of instructional 
guidance and that when they affect instructional practice the 
effects are often judged to be negative. Unfortunately, this t] erne 
is not a new one. More than a decade ago, a National Institute of 
Education (1979) conference report on testing and instruction 
contained the following statements: ''Current testing procedures 
are not helpful to teachers or students in their day-to-day efforts to 
teach and learn" (p. v) and "Present day testing programs are largely 
extraneous to everyday classroom teaching" (p. 3S9). Given the 
interest in improving the relationship between assessment and 
instruction, there have been many calls for changes in testing 
practice that would result in test formats being aligned more with 
instructional tasks and in test results that would be more useful for 
instructional decision making (e.g., Glaser, 1986; Linn, 1988; 
Nitko, 1989). Some have noted that progress in the field of cogni- 
tive psychology offers a new perspective on educational measure- 
ment and evaluation with respect to innovative assessment design 
models (Snow ^ Lohman, 1989) and the characterization and 
measurement of skilled performance (Glaser, 1986, 1988), The kind 
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of assessment that is envisioned would involve both assessment- 
guided instruction and instructionally guided assessment. 

To date, the most prevalent approach to aligning assessment 
efforts with instruction in mathematics has been to focus on 
assessment-guided instruction by altering the content and form of 
externally mandated tests. In particular, noting the reported ten- 
dency of teachers to be influenced by the content and form of 
externally mandated tests (e.g., Madaus et al., 1992), many educa- 
tional reformers have advocated an assessment-driven reform strat- 
egy. The major premise of this strategy can be called the what you 
test is what you gef (WYTIWYG) principle; that is, teachers will 
devote substantial instructional attention to the subject-matter 
content and item formats represented on externally mandated 
tests. Therefore, some educational reformers have concentrated on 
substantially altering these tests, with the hope of thereby influ- 
encing classroom instruction to move in desirable directions. The 
results of these efforts are now appearing in some state-level and 
district-level testing programs. For example, as noted earlier in this 
chapter, some states have developed mathematics assessments 
that not only attempt to measure a range of curricular topics 
broader than that covered in the typical classroom in the state, but 
they also utilize non-multiple-choice formats, including open- 
ended tasks (e.g., California!, collaborative group assessments (e.g., 
Connecticut!, and portfolios (e.g., Vermont). The content and 
performance goals of these tests are compatible with documents 
like the NCTM Curriculum Stamlarcls, and the tasks and activities 
used in these assessments are not unlike those mentioned earlier 
in this chapter as characteristic of a strong classroom mathematics 
instructional program. There are at least two expected benefits of 
influencing teachers to "'teach to the test" in this case: teachers 
may begin to u.'>e the alternative measures in their own assess- 
ments, and the test results may he more useful to the teachers for 
instructional guidance. 

There are, however, some limitations to an assessment- 
driven reform strategy such as WYTIWYG. Sil ver ( 1 9921 argues that 
the "teaching to the test" phenomenon may not be sufficiently 
robust to support the reform effort in mathematics assessment. He 
further argues that a more realistic view would be "what you get is 
what I can teach" (WYGIWICTI. Teachers, especially elementary 
school teachers with limited knowledge of and experience with 
mathematics generally tend to feel more comfortable with and 
capable of teaching lower level knowledge and skills rather than 
more complex knowledge and processes. These teachers are quite 
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likely to be inHucnccd less by higher level content on tests than the 
assessment-driven reform advocates might hope. Because much of 
the evidence pointing to the influence of tests on instructional 
practice has found the influence to be in the direction of basic skills, 
which is the direction also predicted by WYGIWICT, it is difficult 
to predict the impact that higher level testing alone could have on 
mathematics teaching. Moreover, some reports on teachers' beliefs 
and actions (e.g., LeMahieu Leinhardt, 1985; Salmon-Cox, 1981) 
contain evidence that the relationship between testing and teach- 
ing is more comple’^ than that implied by WYTIWYG, because 
teachers are not always greatly influenced by the content or format 
of standardized or externally mandated achievement tests. Given 
this view, it is unlikely that an assessment-driven reform strategy 
can be successful without attention also to the continuing educa- 
tion and support of teachers to fortify their classroom instructional 
programs. Any approach that emphasizes external assessments and 
ignores the instructional programs is unlikely to succeed in mak- 
ing substantive improvements in the teaching and learning of 
math^'inatics. 

The vision of merging assessment-guided instruction and 
instructionally guided assessment will be realized if classroom 
mathematics instructional programs are strong, if attention is 
given to making good use of the naturally occurring opportunities 
to collect assessment information that are embedded in instruc- 
tion, and if externally mandated assessments lu (ome instructionally 
guided, that is, closely tied to important instructional goals. The 
likelihood of this merger occurring will be enhanced if viable 
models and approaches can be identified and implemented. 

The example from Australia discussed earlier in this chapter 
may be a good model to consider because of the approach it takes 
to blending externally mandated assessment with classroom in- 
structional needs. According to Stephens and Money [19951, the 
Victoria Curriculum and Assessment Board requires an external 
examination of students for high school credit in their mathemat- 
ics courses. The examination consist of four parts: ( 1 ) an investiga- 
tive project representing 15 to 20 hours of student work, (21 a 
challenging problem chosen from a set of four problems and 
requiring about 6 to 8 hours of student work, [5) a 90-minute, 
multiple-choice, midterm test of skills and standard applications, 
and (41 a 90-minute final test requiring solution of tour to six 
problems, some of which are routine and others nonroutinc. Thus, 
this examination demands severardifferent types of student perfor- 
mance and assesses a wide range of mathematical proficiencies 
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associated with a course of study. As noted earlier, some parts of 
these assessments are administered and scored by classroom teach- 
ers, and teachers are required to devote some portion of their 
classroom time to parts ( 1 ) and (21, which require a substantial time 
commitment on the part of students. Because the scoring criteria 
for these challenging tasks are shared with teachers md because the 
tasks themselves become pan of the instruction during the course, 
the external exam becomes a focus of instructional attention. Yet, 
because the tasks represent important and desirable educational 
outcomes, the instructional influence is appropriate. Moreover, 
the examination design is heavily influenced by a consideration of 
important instructional goals. 

Another model approach to consider is the use of portfolios. 
As they have been used in Vermont's pilot project, portfolios 
represent another way to blend the needs of externally mandated 
assessment with classroom instructional needs. Like the Austra- 
lian examination system, the assessment plan under development 
in Vermont consists of several different kinds of student perfor- 
mances: "best pieces" of student work identified by a student, a 
broader compilation of a mathematical work, and a uniform test of 
a student's knowledge and understanding of mathematics concepts 
and procedures (Vermont Department of Education, 1991). Class- 
room teachers were heavily involved in the Vermont portfolio pilot 
study in a variety of ways, such as providing the tasks and activities 
that were included in their students' portfolios and helping their 
students select the "best piece" examples. Teachers also partici- 
pated at the state level by serving on the committee that set the 
portfolio scoring criteria and by participating in the rating sessions 
for the sample portfolios. It should be noted, however, that a 
number of technical problems have been detected in Vermont's use 
of portfolio assessment to report student performance (Koreiz, 
Klein, McCaffrey, Stecher, 1993) and these technical problems 
must be solved before the portfolio model is formally adopted in 
Vermont. Nevertheless, despite the technical problems encoun- 
tered in Vermont's externally mandated, state-level assessment 
program, it appears that the portfolio model has considerable 
potential for merging the interests of assessment and instruction. 
In particular, as portfolio evaluation guidelines are made available 
to teachers and as teachers are trained to evaluate portfolios, it is 
likely that teachers will devote instructional attention to the 
variety of mathematical activities that are embodied in the pc.rtfo- 
lio process, thereby promerting instructional retorms at the class- 
room level. 
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Neither the Victoria examination nor the Vermont portfolio 
approach is likely to be viewed by all as viable and desirable. Yet, 
the need to develop alternative approaches to assessment that can 
serve the needs of external accountability and internal instruc- 
tional guidance are absolutely critical. If the efforts of this wave of 
mathematics education reform are to succeed, assessment and 
instruction will need to become meshed to a greater extent than 
ever before. Tests will need to become viewed as providing vital 
information for instructional guidance. In this way, we may see the 
realization of Glaser's futuristic prediction: "In the twenty-first 
century, tests and other forms of assessment will be valued for their 
ability to facilitate constructive adaptations of educational pro- 
grams" (1986, p. 4S1. Moreover, as instructional programs are 
enriched to engage students actively in the consideration of math- 
ematical ideas, with an emphasis on problem solving, reasoning, 
and communication, teachers will need to be more skilled in 
extracting important information from the on-going activities in 
the assessment-rich environment of the classroom. 

As the nation rises to the challenge of establishing national 
standardsandnationaltestingprograms to monitor students' progress 
toward those standards, it is more important than ever to remember 
that the kinds of tests we need are those that can also serve v'ell as 
guides for instaiction. It would also be wise for us to recall that 
instructional activities in the classrooms ot good teachers can be a 
nch source of assessment information — richer by far than any single 
lest both for measuring student achievement and also for informing 
and guiding instructional decisions. Our educational system is 
unlikely to be improved by designing and implementing a national 
examination system unless that system has at its heart solid instruc- 
tional goals for students and a sensible approach to assessment that 
blends internal and external forms of assessment information and 
that includes attention both to instaictionally guided assessment 
and to assessment-guided instruction. 
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1 . 1 he preparatuai nt this paper was suppor ted in part by a gram from 
the Ford Foundation lor the QUASAR piciieet Igrant number H90 0S72). 
Any opinions expressed herein are those of the authors and do n(it 
necessarily reflect the vn ws of the Ford Foundation. 

2. Although space does not permit a thorough discussion of the varied 
purposes of external assessment, those who use information from such 
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tests to make instavctional decisions should be mindful of the purposes for 
which the test was designed and use the information accordingly. Fairly 
complete, or otherwise interesting, treatments of different purfx)ses and 
forms of externally developed or externally mandated assessment can he 
found in Airasian Madaus (1*'72), Frechtling ( 198^), Linn (1 988), Nitko 
(1989), Payne (1982), and Whitney (1989). 

8. for a general discussion o^- the mismatch between commercial 
standardized tests and the gords of a "thinking curri*:ulum," see Resnick 
ifv Resnick (1991) and R. G. Brown (1991). The relationship between 
external testing and mathematics education reform is discussed by Silver 
(1992). 

4. QUASAR (Quantitative Understanding: Amplifying Student 
Achievement and Reasoning) is a leseaich and development project aimed 
at the enrichment of the mathematics instructional program for students 
attending middle schools (grades 6--8) in ec(»n(Mnieally disadvantaged 
communities (Silver, 1998). One of the project activities — in addition to 
pi{)viding technical assistance to schools and carefully monite.ring pro- 
gram activities at the school and classroom levels --has been the design 
and administration of a collection of assessment tasks to measure growth 
in students' problem solving, reasoning, and communication. 
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CHANGING MATHEMATICS EDUCATION 

Mathematics education is changing rapidly in a number oi coun- 
tries. In several — The Netherlands, Denmark, Australia — these 
changes began to take place in the 1980s. Other nations are 
currently in the process of reforming mathematics education. One 
of the most visible dialogues regarding change has been taking 
place in the United States, where the Mathematical Sciences 
Education Board (19901 has advocated restructuring the entire 
mathematics curriculum in terms of the following changes in the 
context of mathematics education. 

■ Cluw^c'< in the need for nicitbcnuilics. As the economy 
adapts to information-age needs, workers in every sector 
must learn to interpret intelligent, computer-controlled 
processes. Most jobs nov/ require analytical rather than 
merely mechanical skills, so most students need more 
mathematics in school as preparation for routine 
jobs. . . . Similarly, the extensive useof graphical, financial, 
and statistical data in daily newspapers and in public policy 
discussions suggests a higher standard of quantitative lit- 
eracy for the necessary duties of citizenship. 

■ Ch(ii\^C'< in nuithcmatic^ and bow it is used. In the past 
quarter of a century, significant changes have occurred in 
the nature of mathematics and the way it is used. In part, 
it is because of the nature and rapidity of these changes that 
the social constructivist philosophy of mathematics has 
emerged. Not only has much new mathematics been dis- 
covered but also the types and variety of problems to which 
mathematic s is applied have grown at an unprecedenled 
rate. Most visible, of course, has been the development ol 
computers and the explosive growth of computer appli- 
cations. Most of these applications have required the 
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development of new mathematics in areas in which this 
was not feasible before the advent of computers (Geoffrey 
Howson, personal communication). 

■ Changes in the role of technology. Computers and calcula- 
tors have changed the world of mathematics profoundly. 
They have affected not only what mathematics is impor- 
tant but also how mathematics is done (Rheinboldt, 198sS). 
The changes in mathematics brought about by computers 
and calculators are so profound as to require readjustment 
in the balance and approach to virtually every topic in 
school mathematics. 

■ Changes in American society. The changing demographics 
of the country and the changing demands of the workplace 
arc not reflected in similar changes in school mathematics 
(MSEB, 1989). In the early years of the next century, when 
today's school children will enter the workforce, most jobs 
will require greater mathematical skills (lohnston ik Packer, 
1987). At the same time, white men — the traditional base of 
mathematically trained workers in the United States — will 
represent a significantly smaller fraction of new workers 
(Oaxaca Reynolds, 1 988). Society's need ior an approach to 
mathematics education that ensures achievement across 
the demographic spectrum is both compelling and urgent. 

■ Changesin underslandingof how students learn. Learning 
is not a process of passively absorbing information and 
storing it in easily retrievable fragments as a result of 
repeated practice and reinforcement. Instead, students ap- 
proach each new task with some prior knowledge, assimi- 
late new information, and construct their own meanings 
(Resnick, 1987). This constructivist, active view of learn- 
ing is obviously consistent with the social or cultural view 
ot math rma tics and must be reflected in the way math- 
ematics is taught. 

■ Changes in international conipelilivencss. Just as recogni- 
tion ol the global economy is emerging as a dominant force 
in American society, many recent reports have shown that 
U.S. students do not measure up in their mathematical 
accomplishments to students in other countries (e g., 
Lapointe, Mead, Phillips, 1989; McKnight et al., 19;. 7; 
Stevenson, Lee, ik Siigler, 1986; Stigler ik Perry, 1988). The 
implications of such data for employers is that the Ameri- 
can worktorce will not be competitive with workers from 
other countries. 
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These points make the argument that a complete redesign oi the 
content of school mathematics and the way it is taught are urgent. 

Changing global conditions have led to changing goals in the 
schools. In The Netherlands, for instance, national educational 
goals, for the majority of the children, are 

1. To become an intelligent citizen iniathematical literacy!; 

2. To prepare for the workplace and for future education; 

3. To understand mathematics as a discipline. 

Such goals resemble closely those articulated by the British Com- 
mittee of Inquiry into the Teaching of Mathematics in Schools 
(Cockcroft, 19821 as responsibilities of the teacher: 

■ Enabling each pupil to develop, within his [and her) capabili- 
ties, the mathematical skills and understanding required for 
adult life, for employment, and for further study and training. 

■ Providing each pupil with such mathematics as may be 
needed for the study of other subjects. 

■ Helping each pupil to develop so far as it is possible [an) 
appreciation and enjoyment of mathematics itself and [a) 
realization of the role it has played and will continue to 
play both in the development of science and technology 
and of our civilization. 

■ Above all, making each pupil aw'are that mathematics 
provides ... a powx‘rful means of communication. 

A fourth set of goals addressing change in mathematics 
education was prepared by the Commission on Standards for 
School Mathematics of the National Council of Teachers of Math- 
ematics in 1989. In its report, CAirriculum and Evaluation Stan- 
dard'i for School Matbcwatic'<, NCTM lists four societal goals and 
five goals for students (NCTM, 1989i. 

The four general societal goals for mathematics education are 

1. MathenunwaUv literate workers. The techntilogicnily de- 
mnnding workplace of today and the future will require math 
ematical understanding and the ability to formulate and solve 
ei^mplex problems, often with others. 

2. Uteinny^lcanmi\’ Most winkers will change jobs frequent Iv 
and so need flexibility and pndMem-solving ahihtv to enable 
them to explore, create, aeeommodate to changed conditions, 
and activelv create new knowledge over the course of then 
li\es. 
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3. Opportunity for all Because mathematics has become 
'"a critical filter for employment and full participation in our 
society," it must be made accessible to all students, not lust 
the white males, the group that currently studies the most 
advanced mathematics. 

4. An informed electorate. Because of the increasingly techni- 
cal and complex nature of current issues, participation by 
citizens requires technical knowledge and understanding, es- 
pecially skills in reading and interpreting complex informa- 
tion. (NCTM, 1989, pp. 3-S) 

Then asserting that educational goals for students ''must reflect 
the importance of mathematical literacy/' NCTM proposes five 
general goals for students: 

1. Learning to value mathematics. Understanding its evolu- 
tion and its role in soeiety and the sciences. 

2. iieconiin^ confident of one's own ability^ Connng to trust 
tone's own mathematical thinking and having the ability to 
make sense of situations and solve problems. 

3. hecominy, a mathematical problem solver. This is essential 
to bec()ming a prtiductive citizen and requires experience in 
solving a variety of extended and nonrouline problems, 

4. Learning, to communicate niathematically. Learning the 
signs, symbols, and terms of mathematics. 

5. Learning to reason mathematically. Making conjectures, 
gathering evidence, and building mathematical arguments. 

These goals imply that students should he exposed to numerous 
and varied interrelated experiences that encamrage them to 
value the mathematical enterprise, to develop mathematical 
habits of mind, and to understand and appreciate the role of 
mathematics in human affairs; that they are enctiuraged to 
explore, to guess, and even io make errors st) that they gain 
ct)nfidence in their ability to solve complex pmblems; that they 
read, write, and discuss mathematics; and that they conjectuve, 
test, and build arguments about a conjecture's validity, . . , The 
opportunity for all students to experience these components of 
mathematical training is at the heart of our vision of a qualitv 
mathematics program. The curriculum should be permeated 
with these goals and experiences such that they become com- 
monplace in the lives of students. (NC1‘M, 1 989, pp. .s-6) 

Each ot the five goals siaiemenis rellecis a shill away from 
traditional practice. I'raditional skills are subsumed under more 
general goals lor problem solving, communication, and the devel- 
opment of a critical attitude. 
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Changing Theories 

At the same time that the goals of mathematics education arc 
changing, we are also wiinessing the evolution of new theories for 
the learning and teaching of mathematics. Romberg (1991) points 
out that these sets of goals all implicitly reflect a social constructivist 
philosophy of mathematics. Galbraith (1993) compares the con- 
ventional and constructivist paradigms, as paraphrased below: 

■ There exists a simple reality that is realized in universal 
laws and can be verified by objective observation (Conven- 
tional), as opposed to a series of multiple constructed 
realities, where truth is relative (Constructivism). 

■ Facts and values are independent in the conventional view, 
but interdependent in the constructivist view. 

■ Problem solutions have widespread application across con- 
texts (Conventional); problem solutions have only local 
applicability (Constructivism). 

■ Phenomena have no meaning except in the context for 
which the construction occurred (Constructivism), (pp. 
73-74) 



The constmetivist is prepared to examine a set of results and consider 
their possible application to other situations given the contextual 
features of both. Transferability, rather than generalizability, charac- 
terizes this aspect of the search for consensus. 

At the Freudenthal Institute, the ''theory for realistic math- 
ematics education" evolved after twenty years of developmental 
research that in several important respects correlates with the 
constructivist approach (see de Lange, 1987; Freudenthal, 1983, 
1991; Gravemeijer, van den Heuvel-Panhuizen, and Streefland, 
1990; Treffers, 1987). There are, however, some differences. 

The social constructivist theory is in the first place a theory 
of learning in general, whereas realistic mathematics theory is a 
theory of learning and instruction that evolved only in mathemat- 
ics. One of the key components of realistic mathematics education 
is that students reconstruct or reinvent mathematical ideas and 
concepts through exposure to a variety of "real-world" problems 
and situations. 

This process takes place by means of progressive schem- 
atization and horizontal and vertical mathemaiization. The stu- 
dents are given opportunities to establish their own pace in the 
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concept-building process. At some point, abstraction, formaliza- 
tion, and generalization take place, although this may not occur for 
all students. The question, for instance, of how far wc can be 
successful within mathematics if our students master ''only'' the 
skill of transferability, instead of gencralizability, is still open for 
discussion. 

Chcini^in^ Content 

It IS not only goals and teaching and learning theories of mathemat- 
ics education that have changed. New subjects are slowly and 
sometimes cautiously introduced into curricula — a prominent 
example is discrete mathematics, and there seems to be a revival of 
geometry. Some of these subjects take their place in the curriculum 
because new technologies have opened new possibilities. The 
computer has had some (limited! impact on the teaching of math- 
ematics, but future development might have more visible effects. 
A graphic calculator with a computer algebra system would outdate 
both personal computers and graphic calculators as we now know 
them. Also, if interactive CD enters the consumer market, it will 
in all likelihood become an important tool in education as well. 

Apart from these external t actors, internal factors are operat- 
ing to change the content of school mathematics. We mentioned 
the revival of geometry that, offered with new insights and in a 
broader context, gives it different content. Other domains that 
would expand the curriculum, including the central place of calcu- 
lus, the emphasis on fractions and percentages, the role of loga- 
rithms, have been discussed in the last decade with mixed results. 
Finally, new insights into how children learn and what didactical 
tools we possess to enable children to understand better certain 
mathematical tends are important issues. Changing learning theo-' 
ries can definitely lead to new content subjects. 

( .Assc'ss'uu’n/ 

There seems to be a lot o\ truth in Calbraith's conclusion ( 199.^1 
that wc need to controin inherent contradictions that exist when 
constructivism drives curriculum design and knowledge construc- 
tion, but that positivistic remnants of the eonventional paradigm 
drive the assessment process. In The Netherlands, this distinction 
conironted us with a paradox. Many teachers and researchers react 
with, "1 like the way you have embedded your math education in 
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a rich context, but I will wait for the national vStandardized test to 
see if it's been successful." Popper (1968) and Phillips (1987) have 
argued tiiat a theory can be tested only in terms of its own tenets. 
This means that the constructivivSt or realistic mathematics educa- 
tion of teaching and learning can be evaluated only by assessment 
procedures derived from the same principle. It also means that 
assessment procedures should do justice to the goals of the curricu- 
lum and to the students. Context-independent generalized testing 
is unjust in such a case (most of the time, the context will also 
include the real world of mathematics itself, at least in the realistic 
mathematics education approach). Therefore, an essential ques- 
tion is. Does assessment reflect the theory of instruction and 
learning represented by the curriculum? 

Not only have new notions about learning influenced the 
ideas about "authentic" assessment, but the new goals, emphasiz- 
ing reasoning, communication, and the development of a critical 
attitude, will have an impact. Popularly associated with "higher 
order" thinking, these skills were seldom or never present in 
traditional education and assessment. The change toward a "think- 
ing" curriculum torces us to focus on "thinking" assessment as 
well. 

In the next four sections of this chapter, therefore, we will 
examine levels in assessment, the role of content, necessary and 
sufficient information, and different test formats. In the final 
section, we consider briefly a number of issues that bear directly on 
the design of assessments for use in the learning and teaching of 
authentic mathematics. 



l.hVFTS IN ASSLSSMl-.Nl 



Most instruction in mathematics education has focused on learn- 
ing to name concepts and objects and to follow specific procedures. 
The result, as Bodin ( 199d) points out, is that a student ean solve a 
given equation without being able to express the steps taken or to 
justify the results without kntnving which type of problem it is 
connected to, or without being able to use it as a tool in another 
situation. As an example, Bodin observed children who were able 
to solve the following equatiem, 

Tv - B IBx 4 IS 
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but who were unabL to answer the question: 

Is 10 a solution to the equation 7x - 3 == 13x +15? 

Here we notice different levels of ''knowledge/' The equation can 
be solved simply by following a procedure, but the latter question 
requires judgment. 

In a global economy, the emphasis for all students must shift 
from following rote procedures to the development of the higher 
order thinking skills, L, Resnick (1987) listed salient features of 
higher order skills, many of which are in stark contrast to the 
mathematics criteria that prevail in many schools. She noted that 
higher order thinking skills tend to be complex, their total path not 
"visible" (mentally speaking) from any single vantage point. Fur- 
thermore, higher order thinking involves the application of mul- 
tiple criteria that sometimes conflict with one another. 

Experiences with higher order thinking in mathematics and 
its assessment have also been described in some detail by de Lange 
(1987), who stressed the process-versus-product character of the 
new curriculum. During experiments in The Netherlands over the 
last decade, it became clear that the mathematics in the new 
curriculum is nonalgorithmic, has multiple solutions, and in- 
volves uncertainty and a need for interpretation. Thus, one of our 
major challenges is to find structure in apparent disorder; we need 
to carry out considerable work in the kinds of elaborations and 
judgments required to reinforce the place of higher order thinking 
m the new curriculum. 

Further, we need to address the problem at the different 
cognitive levels in both instruction and assessment. For such 
needs, we identify three cognitive levels, which, although some- 
what arbitrary, are sufficient and refle i the decade-longexpcrience 
we had with our research on the content and implementation of the 
new mathematics curriculum in The Netherlands, To describe 
these three levels, it is first necessary to define the guiding prin- 
ciples and goals for assessment that we employed, 

N(‘vv Prim if)lvs nnd C'lOjk lor A^^cssment 

In The Netherlands in 1987, we formulated five principles that 
have guided our assessment work: 

■ The first and primary purpose of testing is to improve 
learning and teaching. 
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■ Methods of assessment should enable the students to 
demonstrate what they know rather than what they do not 
know. 

■ Assessment should operationalize all of the goals of math- 
ematics education. 

■ The quality of mathematics assessment is not determined 
by its accessibility to objective scoring. 

■ The assessment tools should be practical (de Lange, 1987b 

The First dncl Frinitiry Furpose ol Tesiini^ is to I'mprow Lt\uninp 
<ind roac'h/ng’. This first principle is easily underestimated in the 
teaching-learning process. All too frequently we think of testing as 
an end-of-the-unit or end-of-the-course activity whose pnr-rary 
purpose is as a basis for assigning course grades. A properly designed 
test or task should not only motivate students by provi'bng them 
with short-term goals toward which they work, but also by provid- 
ing them with feedback concerning the learning process. Further- 
more, more complex learning results, such as levels of understand- 
ing, application, and interpretation, are likely to be retained longer 
and have greater transfer value than results at the kncnvledge level. 
This means that we should include measures of these more ccmi- 
plex learning results in our tests. In this way we provide students 
practice in reinforcing the comprehension, skills, applications, and 
interpretations that we are attempting to develop. 

iWethods of A^sc^sment Should Enable Students to Oemon- 
str.ne Wlwt They Know R<nhet Th<m Whnt They Do Not Know. This 
principle — scnnctimes referred to as positive testing— is borrowed 
from Cockcroft (19821. Most traditional testing consists of check- 
ing what the students do not know; students are given a specific 
problem that has, in most cases, a single solution. It the student 
does not knenv how^ to solve the problem, there is usually no wviy 
to gauge what he or she does knenv. One result may be that the 
student leases ccmfidence— an effect not conducive to promcning 
"the develc^pment ot the talents ot all people" [MSEB, 1991b 

Assessment Should Operntionnli/e All ot the Gonls ot Wnthemol- 
/■( s Fducntii )n. The fact that tasks that operationalize "higher order 
thinkingskills" are difficult to design and score should be no reason 
to restiict ourselves to the testing as usual. It is essential that we 
be able to test the capacity of students for madiematization, 
reflection, discussion of models, communication, creativity, gen- 
eralization, and transfer. This means also that we are less interested 
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in the product than in the process that leads to this product. The 
consequence is that wc need a variety of effective assessment 
methods. 

The Qutility of Miithcmiiiics Assessment Is Not Determined by Its 
Accessibility to Objective Scoring. This principle is a very important 
one. In the first place, the quality of a test is frequently derived from 
the accessibility to mechanical or objective scoring — a problem 
endemic in the United States. It may be difficult to score more 
complex tasks but experience shows that at the same time the 
advantages are much greater than the perceived disadvantages. In 
the first place, in complex context problems the problems become 
much easier for the student to understand if he or she makes the 
problem his or her own and the answers show what the student is 
actually capable of doing. In traditional tests, we often cannot even 
tell whether the student understands the question fully, let alone 
Hnd in the answer any indication of level of understanding. Second, 
the professional mathematician is not evaluated on the basis of 
tests but on the basis of his or her output Finally, an important 
aspect of educating the mathematics education community is the 
development of new forms of assessment and of guidelines for 
judging all forms of assessment. 

The Assessnu^nt Tools Should be PructiCiil. What we mean by 
practiciility is that assessment should have practical applica- 
tions in the school. At a given school, a balanced package of 
assessment tools will be different from those used at another 
school because of physical limitations, differences in school 
culture, accessibility to outside resources, and other factors. We 
need also to bear in mind the demands made by assessment on 
the teacher. 

Lower ie\ el Assessment 

At the lowrr level, we are dealing primarily with traditional 
mathematics and traditional tests. This level concerns objects, 
definitions, technical skills, and standard algorithms. The follow- 
ing are typical examples of lower level problems: 

■ Solve the equation - d - Idx + \S, 

m What is the average ot 7, 12, S, 14, l.S, 9? 

■ Draw the giaph ol ^ - y’ - 2y + tS. 

■ Make a drawing that illustrates 1,M. 
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■ Write 69% as a fraction. 

■ Line m is called the circle's . (Adapted from 1989 

Illinois State Board of Education testing materials, with 
permission.) 

Quite often, multiple-step problems from the real world arc 
introduced at this level, although texts may treat them as standard- 
ized exercises with no real-problem meaning. The following illus- 
trate this point: 

■ Christine borrowed $168 from the Friendly Finance 
Company. She had to pay 6% interest. 

— How much interest will she pay in one year? (NAEP, 
19901 

— We drove our car 170 miles and used 4 gallons of 
gasoline. 

— What was our mileage — that is, how many mpg? 

It can be argued that these items require more processing. Rut, in 
tael, the solutions simply involve routine, sequential processing. 

/Wicicllc Lcv(‘l Assessment 

The middle level can be characterized by having students relate 
two or more concepts or procedures; thus, amneciions. 

inte^rcition. and pnddem solving, are terms often used to describe 
this level. It is more difficult to provide examples at this level from 
extant testing sources, although there have been good tests that 
operationalize the principles we have articulated. However, sev- 
eral examples trom oui work, and the new work of others, illustrate 
the possibilities presented by the new curricula: 

■ You have driven your car '/u)f the distance you want to 
cover and your tank is '/i full. Do you have a problem? 
(fifth grade) jStreefland, in Chavemeijer et al., 1990) 

■ In pictures A and B that follow: 

— How many of the cartons will be left after the box is 
Hlled? (Al 

-■ I low many cartons do we need tor cS 1 children? (B) (van 
den Heuvel-Panhuizcn C^,raveineiier, 1990). Used 
with permission. 

Note: Te . items eited in this eliapter mav he released, hut impuhhshed. 
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■ You can use these rectangles to make other rectangles 
that are 2 units deep and of whatever width you choose. 
For example, here are some 2 x o rectangles: 



— Describe how many 2 X n rectangles it is possible to 
make from 2 x 1 rectangles (where n is a natural 
numherl. Justify your conclusion. 

—Extend >otir solution to describe how many ^ x 4 
rectangles can be made from 3 X 1 rectangles. 

—Extend your solution further to describe how many ni x n 
rectangles can be made from w x 1 rectangles (where 
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m and n arc natural numbersl. [Victoria Curriculum 
and Assessment Board, 1990. Used with permission.) 

Another example from anew standardized test in The Netherlands: 

■ Here you see three squares. In each one, a border of width 
1 cm has been made black. 




— Draw such a square with side x = 5 cm. 

— Whenthe border hasawidth of 1 cm,thearca[Alof che 
white inner part can be represented by the formula: 

A [whitei = [X - 2)'. 

Check this for the square you have just drawn. 

— Check whether or not the formula fur A [whitel has a 
meanini; in each of the following cases: 

.\ = X = X = 1,000,000. 

— Compute for which value of .v the area equals 400. 

— For the black area the formula representing the area is 

A (blacki ^ 4\ - 4. 



Compute for which value ot x the area equals 400. 
The following graphs represent the formulas A [white! 
and B [blackl: 
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A (white) 



B (black) 



— Compute (in one deeimal) the value(s) of x for whieh 
the white and hlaek areas are equal. 

■ Sylvia has computed that for .v = 7.4, the area for the 
white and hlaek part are equal. Explain why this can- 
not he the case. ( W 1 2- 1 6, team, 1991. Used with permis- 
sion.) 

The examples — all taken from tests that are in use — clearly 
indicate features that do not belong on the lowest level. It is 
interesting to compare the two car fuel problems. In the first (we 
drove 1 70 miles and used 4 gallons of gasoline. How many mpgt ), 
the students have already been trained to grab their calculator and 
to simply divide 170 by 4. The other problem (you have driven •/' 
of the distance, and 7’ ^^1 timk of fuel is lelt. Do you have a 
problem n does not describe a certain strategy. The children lett on 
their own must design their own strategy. As a side effect — but a 
very important one- the teacher will gel valuable feedback on the 
level of understanding of the student. Actual solutions of the 
(priniary school) students in figure 4.1 show remarkable strategies 
(Sireeiland, 1990. Used with permission). 

The items about milk cartons luw. at more complex activities 
of a problem-solving nature (van den Heuvel-Panhuizen ^ 
C'.ravemeijer, 1990). ITe irue-t.o-life contexts not only help the 
children to grasp immediately the situation of the items, hut they 
also offer the opportunity to sound out i he chi Idren'sahi lilies while 
avoiding the obstructions caused by formal notat ion. T\v: question 
in the milk carton problem is: How many do not fit in the box* 

uia 



o 




\oniA\(;[ \\mu)i i 101 



i 

ir 




^ vccuU sin I -f»'+" Cu^ 
1i*tn you 

ft^t av'Af^ io 'tf '5 
anouch -fo cLriVd 
TUa. dt^'hx.^ce. , 



fLOOUGh -fo cLriVd 




Figure 4. 1 Children's strategics for solving the "2/.^ distance, 1 /4 tank 



whereas the item on the carton shown with the 6 cups asks the 
foUowing question: How many packages of chocolate milk are 
needed for 81 children; First of all, the arithmetic operation is not 
obvious; moreover, even a correct calculation of 81 6 does not 

directly yield an adequate answer. These kinds of questions give 
the teacher information about the children's informal knowledge, 
and solutions can he used to attune the teaching to the children's 
previous knowledge. In this way, tests become an instrument the 
teacher can use to improve learning, 

A last example that might qualify for the middle level is the 
following: 

■ 200 women and 200 men were given a test on how to run 
a family. The inaximum possible sc:)re was 16. The 
results are represented in a box plot as follows: 



of tuel" problem. Used with permission. 
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—Is it possible for the following box plot to represent the 
results of the test for all 400 participants? Explain your 
answer (de Lange, Burrill, Romberg, van Recuwijk, 
19931. 
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This last question demands some real understanding of the box 
plot. The students agree that this is a difficult question. And even 
though they often sense what is wrong, they do not always have a 
very clear way of thinking on paper; for example, two responses 
were 

■ No, it is impossibi' - because its center, 50%, is in the wrong 
position. (This, of course, is true. But the student failed to 
lurther communicate his reasoning properly on paper. So 
we might consider qualifying this item for the highest 
level, because of the reasoning and communication skills 
necessary to solve the problem. There were students who 
thought they lacked adequate information.! 

■ No, it would not be possible because you would have to 
write out all the data, find the new^ median, and that means 
that the first and second quartiles would be different. 

Or an even stronger response: 

■ There is no data to base this chart on available to me! 

"Correct" answers sometimes cannot be given appropriate credit 
because of the lack of information on the reasoning: 

■ No. the first quartile is not correct, neither is the third 
quartile. A correct answer could be this: For women, 50‘\) 
lies right of the 9, and for men 25"o. This means that 
100 + 50 = 150 persons lie on the right of 9. So it is 
impossible that the new box plot represents the results on 
the test. 
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Higher Level Aa^sessmont 

It is even more difficult to describe higher level assessment 
than that at the middle level. This, of course, is partly because 
w^c are dealing with more complex material: mathematical think- 
ing and reasoning, communication, critical attitude, interpreta- 
tion, rertection, creativity, generalization, and mathematizing. 
We will highlight aspects of tests that operationalize some higher 
order thinking skills — at different school levels. A major compo- 
nent will be the '"construction'' by children that completes the 
problem. 

Let us first look at some of the more "open" tests at primary 
school level. Especially when we are dealing with nonalgorithmic 
problems that re Luc to ^he student's real world, we also need to 
know the procedures the children use. Or, to put it even more 
strongly, we are more interested in the process than in the prod- 
uct — that is, the answer — because, of course, there might be mul- 
tiple solutions. All of these arguments apply to the following test 
items. The first relates to the visit of a circus. 

■ The total admission costs for the children are S50. How 
much were the tickets" (van den Heuvel-Panhuizen ^ 
Gravemeijer, 1990. Used with permission.) 
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The first pupil tries to approximate the total amount of SSO as 
closely as possible, while the others apply a formal division, or a 
less formal distribution strategy. The item poses the following 
question; 

— How many children weigh the same as this bear? 




The bear item refers to the children's knowledge of measures. Only 
the weight of the polar hear is given. It is left to the pupils to 
determine how much a child generally weighs. Some children, like 
the first pupil, stick to their own weight; others prefer a round 
number, or they weigh precisely SO or 2S kilograms. The third 
item is: 

■ Design as many sums as possible with the answer ot 100 
(workspace is provided on the answer page). 

The objective of the third item is to elicit the capability of children 
for individual productions, their own responses. The child is asked 
to think, rather than to solve problems (Streetland, 1990; van den 
Ihink, 19S7V A simple way to estimate the scope ot children's 
abilities is to ask them to produce an easy and a difHeult sum. Due 
to this latitude in devisiitg their own produetions, children reveal 
not only what they are capable of, but also their manner of working. 
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Some record only isolated sums, whereas others proceed systemati- 
cal ly, for instance, by always changing the first term by one unit or 
by applying commutativity. 

Another primary school item that might fit on the highest 
level is the following: 

■ Martin lives three miles trom school and Alice five 
miles. How far apart do Martin and Alice live from each 
other; 

This item might be seen as belonging to geometry. Or, perhaps it 
can be solved by just common sense reasoning. Or by visualizing. 
Multiple strategies are possible, at different levels. But it is almost 
certain that the students have never been purposely presented with 
an isomorphic exercise. 

x^pproximately seventy teachers were interviewed about the 
appropriateness of this item. Some of their first reactions to the 
Ljuestion were, "S -d 2, so it's a simple subtraction (lower level! 
and for that reason we don't like it as a test item." A second reaction 
was, "You can't say the proper answer because there is no one 
proper answer, and for that reason this is not a good test item." A 
third reaction was: "You can't give the proper answer because iliere 
is not one proper answer, and for that reason it's a good test item.” 
A typical reaction, falling in this third category, was, "You can't tell 
it exactly, but you can say sennething. For instance, that Martin and 
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Alice cannot live farther away from each other than 8 kilometers, 
or no closer than 2 kilometers. You can show that with a nice 
picture." 

Looking at it in this way makes it a rich item that offers many 
possibilities for different strategies reflecting the reasoning of the 
students. But the teachers' reactions clearly demonstrate that we 
have a long way to go if we want to implement this kind of question. 
In a group of teachers favoring development of more "realistic" 
mathematics education, only 1 7 percent offered arguments like the 
one just quoted. The majority of teachers using more traditional 
books [S7 percent! thought the item was unfit because it did not 
have one single answer. This ambiguity — the lack of one single 
answer — was by far the most frequent argument for including the 
item on the part of the teachers who liked it. In this brief analysis, 
it is evident how difficult the process of change toward "new” 
modes of assessment will be (Gravemeijer et al., 19921. 

There are many other techniques for encouraging students to 
"produce." The following example is a rather simple item: 

■ If no fish were caught, the number of fish will increase 
during the coming years. The graph show^s a model of the 
growth in the number of fish. 
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-•Draw an increase diagram with intervals ol a year, to 
Sian with the interval 1-2. The fish farmer will wail 
s()me years before he will be able to catch annually the 
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same amount of fish as the first year; after every catch the 
number of fish increases again according to the graph. 

— What would you advise the fish farmer about the 
number of years he has to wait after planting the fish 
and the amount of fish that he will catch every year- 
Give convincing arguments (HAVO, National Exami- 
nation, The Netherlands, experiment, 19901. 

One should be aware, when looking at this exercise, that the 
students were not familiar with the ^'differentiation of functions, " 
but they did know about the changes and rates of change of real 
phenomena in a discrete way,* that is, they were not used to 
graphing the derivative of a function, but they were accustomed to 
using increase diagrams. So the first question was very straightfor- 
ward, operationalizing only the lowest level. 

The other question is a different story. It was both new to the 
students and new in its form on the national standardized test in 
The Netherlands. Communicating mathematics, drawing conclu- 
sions, finding convincing argument s are activities that all too often 
are not a part of mathematics tests and examinations. Many 
teachers were surprised by the richness of the question and did not 
know what to think of this development, although a few, those who 
identified the question with the new approach as indicated in the 
experiments, were prepared to some extent. 

In some respects, the students seemed less surprised — it we 
are to judge by the results — although their answers showed a wide 
range of responses: 

■ I would wait for four years and then catch 20,000 kilos per 
year. You can't lose that way, man. 

■ if you wait till the end of the fifth year then you have a big 
harvest every year: 20,000 kg of fish: that's certainly not 
peanuts. If you can't wait that long, and start to catch one 
year earlier you can catch only 1 "7,000 kg, and if you wait 
too long jone yearl you can only catch 18,000 kg of fish. So 
vou have the best results after waiting for five years. Be 
patient, wait those years. Ytni won't regret it. (Van der 
Kooii, 1989. Used with permission.) 

Geneiahzaiion, adjustments t)f models, communicatjon, re- 
I lection are only some ol the characteristics ol the lolUiwing test; 
only a relevant part irelevant for the highest level! has been 
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reproduced, because vve will discuss this test later in more detail 
(see also de Lange, 198^1 

A forester has a piece of land with -LOGO Christinas trees. He 
distinguishes three classes of length: S, M, and L. The small 
trees have just been planted and have no economic value; the 
medium trees are sold for S 10 eai^h and the large ones for $25. 

At the beginning of his first year as owner, his tree farm has 
1,000 small, 1 .000 medium, and 1,000 large trees. All of these 
grow une\entfully until just before Christmas. 

From the experiences of colleagues, he knows approxi- 
mately how much growth to expect p^r year: 

40‘\: of the small trees become medium 
20 of the medium trees become large 

(Here wo omit some lower- and middle-level questions.^ 

The forester wonders what strategy for cutting and planting is 
the most profitable one. He considers several strategies: 

L Cut medium and large trees after one year ami replant to 
return to the starting population of 1,000 of each kind. 

I], Cut medium and large trees after two years and replant t(^ 
return to the starting population of 1 ,000 of each kind. 

III. Cut after one year the large trees only {leaving 1 ,000' and 
replant that same number of small trees: repeat this the 
second year. 

Cutting costs for one tree are SI and planting costs S2. 

m Oecide which is the most profi table strategy m er a twii \ ear 
period. 

■ Find the matrix that represents the general case (With n -- 
length of classes': 



2 



3 



N-1 



N 




P 



P 



How can vou cone hide from the matrix whether or not it is 
possible to get the starting population back* What are the limita- 
tionstg’ the mi Kiel: What refinements would vou like to suggest' 
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It is interesting to see that many students did well on this test — 
a point to be discussed later in :his chapter. Rut we have to hear 
in mind that this test item was completed by students at home, 
so we (eel it cannot he compared with other items discussed so 
far, which were for the most part meant for restricted-time 
written tests. 



THE ROLES OF THE CONTEXT 

Problem-oriented mathematics education places mathematics in a 
context. In realistic mathematics education, the real world is used 
as a starting point for the development of mathematical concepts 
and ideas. According to Treffers and Goffrec [198S!, context prob- 
lems in realistic curricula fulfill a number of tunctions: 

■ Concept formation: In the first phase of a course, they allow 
the students natural and motivating access to mathematics. 

■ Model formation: Context problems supply a firm basis for 
learning the formal operations, procedures, notations, rules, 
and they do this in conjunction with other models that 
function as important supports for thinking. 

■ Applicability: Context problems utilize reality as a source 
and domain of applications. 

■ Practice the exercise of specific abilities in applied 
situations. 

In an earlier article (de Lange, 1 9791, distinctions were made among 
the uses of context in a way that fit with these four tunctions. One 
of the functions of context — and for realistic mathematics educa- 
tion the .most important characteristic — is its use tor concept 
formation, the conceptual mathematization process. This use of 
context presents problems in assessment that are somewhat differ- 
ent from the problems encountered in the other context classifica- 
tions: that is, we usually will not introduce new concepts during a 
test, but apply nc\v' mathematical concepts in some way. 

r unc tiontilit\ 

\o ( o/)fe\f. This category hardly needs turther elaboration. 
However, we cannot resist the temptation to include a recent 
example—from a standardized test from Poland ipersonal commu- 
nication, W. Zawadowskil— that shows a complex task wiiluuit 
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context. The question that confronts us is this: At which level arc 
we working here? Is this higher order because it is so complex, or 
is it lower order because it is repetitive? 



What number is of: 



sin-30MV-'l- ‘ 10.81 ’ + \2.25 
* icos 60*^ + tan 45^1' 



Context. The context in this situation is used 
only to "camouflage" or "dress up" the mathematical problem. 
Most of the so-called word problems and multistep problems from 
the NAEP (19901 are of this form. We refer, for instance, to the 
problem of Christine and the Friendly Finance Company. Similar 
problems would include 

■ The growth factor of a bacterium type is 6 (per time unit). 
At the moment there arc 4 bacteria. Calculate the point in 
time when there will be 100 bacteria. 

■ The interest for a year on a savings account is 8%. S4,000 
is deposited at time 0. At what point in time will this 
amount have increased to So, 000? 

In this category, there are also the familiar types of items, such as 
the one that follows. The goal that should be operationalized with 
this item is: Identify, analyze, and solve the problem using alge- 
braic equations, inequalities, and functions and their graphs (Adapted 
from 1 989 Illinois State Board of Education testing materials, with 
permission, 1 9891. Although the problem obviously docs not qualify 
as other than a lower level assessment item, in this form it is 
interesting to note that as presented, it looks quite different and 
certainly does not operationalize the desired goal: 

■ Which of the follow ig number sentences could be used to 
solve the following problem? Rill weighed lO'^ pounds last 
summer. He lost 4 pounds, and then he gained 1 1 pounds. 
How much does he weigh now^? 

a. 107-14 + 111 = A 
h. (107 -41 1 1= A 

c. 1107 + 1 lU 4 

d. -4 -f 11 = 107 + A 
c. il()7- 11)^4=^ A 
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The issue is not that one has to solve the problem, but that one has 
to analyze notations that no intelligent person would use to solve 
the problem. It is an almost perfect example of an elaborated 
problem item that does not attain its desired goals. It may be very 
difficult to decide whether or not a problem has a camouflaged or 
elaborated context. 

Of greater complexity is the role of context in the following 
test item. It .seems, on the face of it, to fit into the middle level 
category, but the role of the context is deceptive. A model train 
track with a short and a long circuit is illustrated and the time (10 
min) the train will take to complete the short track. The question 
is, How long will it take the train on the longer track: (van den 
Heuvel-Panhuizen ik Gravemeijer, 19901. 
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This is a simple question, an elaborated context, yet, neverthe- 
less, a middle-level test item because a thorough analysis of the 
problem is required and because of the built-in stratification; 
that is, the way the item is presented allows for solutions on 
several levels. However, the elaboration character has to be 
restricted to the train (could be a car, bus), but the map is the 
relevant context. 

ontv\t. We begin with a simple test for 
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Progiara 

Children's Party 



14.00 UUr singing to birthday 



boy/girl 

drinking Icconad# 
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B 



In A, the children are asked to buy “something" and circle the 
number that shows the money left in one's purse. Although there 
arc several degrees of difficulty here, the choice gives indications ot 
what children are capable of. Of course, preferences for a certain 
object play a part (relevant and essential context!. Experiments 
have shown, however, that quite a few children make numerically 
similar choices on tests of this kind, (van den Heuvel-Panhuizen (!k 
Gravemeijer, 1990. Used with permission.) 

Other types of problems show again the "own production" 
aspect of tests coupled with relevant context use. In B, the children are 
asked to make a program for a birthday party, or rather, to complete 
it; the starting time and the activities are already given. It is left to the 
children to determine how long each activity will take. The only thing 
that is predetermined is 4S minutes for the movie. As with most open 
items, this one allows a great many observation points: There must be 
progression of time, durati(Mi must be consistent with the activities 
planned, and finally, digital notatiem of time must at least he under- 
stood (van den Heuvel-Panhuizen (!k Gravemeijer, 199(3). 

As a final example of the relevant use of context, we have 
chosen a standardized test item that was revised in an effort to 
improve its efficacy: 

Among other things the quality of water in a swimming pool is 
judged on the basis ot the amount of urea. Urea enters the water 
via perspiration and urine, it ajipears that the average dailv 
increase in the amount of urea is sOO g per 1 ,(H)0 visitors a dav. 

’I'he water must he refreshed in such a way that the siatutciiy 
standard of 2 g urea per cubic meter (mM will not he exceeded. 

In the model, we make the assumption that !,()()() swim 
mers visit the pool, which has a volume of 1 ,()()() m daily. The 
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refreshing of the water takes place at night. For each daily 
visitor, v^O liters [11 of water will be refreshed. In our model, this 
means a refreshment of v^O m' [vV.’o of the total). 

The first day we start with Ogof urea in the water. At the 
end of the day, the water contains SOO g urea. After refreshing, 
there w'ill he an excess of 483 g urea at the beginning of the 
second day. 

—Show by calculation that the amount of urea is more 
than 933 g at the beginning of the third day. 

—In the course of w'hich day will the statutory standard 
be exceeded: 



A refreshment of 30 I per visitor is not sufficient. Suppose in 
the model, 200 I wnll be refreshed instead of 30 1. Let IJ be the 
amount of urea at the beginning of a certain day. Show' that the 
amount of urea is 0.8 400 at the beginning of the next day. In 
our model, we start again wdth 0 g urea at the beginning of the 
first day. The amount of urea [(7. i at the beginning of the nth 
day can be calculated directly w-ith the formula: 

U. ^ 2000 - 2300 ■ (O.Sl- 

-■ Explain with the aid cif this formula that at the begin- 
ning of each day the amount of urea will meet the 
rec] ui remen ts. In the course of the da\- the statutory 
standard can be exceeded. 

—On which day will this happen for the first time’ 

(HA VO, National Examination, The Netherlands, 1 99 1 ' 

We are not often in a position to deseribe in detail how a rather 
eomplicated item like the last one is consirueted. But in this ease, we 
can look over the shoulder of the test developers. The test designers 
(Roodhardt, personal eomiminicationl first note that the conslrue- 
tion of a good test item takes considerably more time than solving 
the problem. It is first necessary to find a source of potential test 
content. Stmie people have a special ability to recognize prospective 
sources. Libraries are gotnl sources of material. In this particular case, 
the scientific magazine H,0 offered an article that was used ns a 
source. One makes this search with certain principles in mind: 

• The story should fit with tliephilosojdiy of the curriculum; 

• The problem must bear some relevance for the students; 

• The text must inspire additional questitms; 

• It must be possible to design a string of good questions at 
examination level. Mt)st ol the time we are forced to 
simplily the article to the level treated in the classroom, 
but in this particular case, this was not necessary. 
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One of the people involved in the design procciss takes the lead in 
developing a first draft that illustrates the potential of the context 
to the other team members. After discussing the draft, it is 
concluded that it offers attractive possibilities. Especially attrac* 
tive are the questions concerning the point at which the safety 
level will be passed. But a number of hurdles are still to be taken. 
The draft version in this case contained, for instance, the follow- 
ing example: 

a. Each visitor delivers O.5 g urea. 

b. At the end of the day there is 1000 • 0.5 g urea. 

c. Per liter water: 500 1000 = 0.5 g/m^ = 0.5 mg/1. 

d. Fresh water: 30 ■ 1000 = 30,000 1. 

e. Disappearing 30,000 ■ 0.5 mg = 15,000 mg = 15 g. 

f. Only 485 g urea is left, 

—Compute in this way the amount of urea at the end of the 

second day and at the end ot the third day. 

— On which day will the statuary standard be exceeded? 

—The legal standard is 2 mg/1 urea. Will this norm be 

exceeded after three days? 

It was concluded during the discussion that this was not a good 
format; The exemplary computation should be replaced by some- 
thing more substantial. The whole problem is based on these 
computations. Therefore, mistakes in this phase either are not to 
be allowed or we must change the string of follow-up questions. 
The essential question is about exceeding the standard. But we 
could give away a little bit about the computation — give the 
student a checkpoint. If this fails, it tells us something significant 
about the level at which the student is working. 

Thus, a new version begins to take shape. Among other 
things, the quality of water in a swimming pool is judged on the 
basis of the amount of urea. Urea enters the water via perspiration 
and urine. It appears that the average daily increase in the amount 
of urea is 500 g per 1 ,000 visitors a day. The water has to be refreshed 
in such a way that the statutory standard of 2 mg per liter will not 
be exceeded. In our model, this means that if 1,000 people visit daily 
a pool with a volume of 1,000 m\ the statutory standard for 
refreshment is 30 1 per person per day. This means that at night we 
refresh 30,000 1 (or 3 percent of the total). On the first day, we begin 
with 0 g of urea in the water; at the end of the day, the water 
contains 500 g of urea. After refreshing, 485 g of urea will remain. 
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■ Compute the amount of urea at the beginning of the first 
day. 

■ Assume that the amount of urea at the beginning of a day 
is U g. 

■ How large — expressed in U — is the amount of urea at ^hc 
beginning of the next day? 

■ On which day will the statutory standard be exceeded? 

Upon reflection, we decided that the computation based on 
the standard for urea of ‘'1 mg/\" could cause misunderstanding 
that, in turn, might cause problems when grading the tests. So this 
was changed to '"1 mg/m\" 

In the final discussion, the test takes the form described earlier. 
The first two questions arc partially answered by telling the students 
to show that at the beginning of the third day the amount is 955 g. 
And they are asked explicitly to explain their answers. Also, the 
decision was made to use a photograph with the test, in the hope that 
this would have some psychological effect on the students. 

One aspect of this item that was discussed in detail was 
context. It was clear that context plays a relevant role and that the 
students would recognize the real-world quality of the problem. Its 
context was both real and relevant. 

Rod I versus A rti ticid I 

It is clear from the previous discussion the extent to which the 
authors considered context in the design of this problem — the 
relevance and the reality of this problem for the students for whom 
the examination was designed. In general, the problem was well 
received both by students and their teachers. But the context of the 
following problem was a different matter! 

■ As the result of the increase in space travel in the twenty- 
first century, a new disease from outer space struck the 
inhabitants of the earth. The graph shows the number of 
victims of this disease on the planet for the years 2079-'2086. 

— Draw a new graph on logarithmic paper for the number 
of victims. 

— During which period is the increase in the number of 
victims nearly exponential? 

— Compute up to one decimal point the annual growth 
factor in cases during this period. 
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Those suffering from the disease were primarily space 
travelers and employees at space centers. The pie chart 
shows the distrihution of victims in 2086. The numher of 
infected people in The Netherlands in 2086 was space 
travelers, 60; employees at the space center, .S; relatives of 
space travelers, and others, 2. 

— Make a pie chart for the situation in The Netherlands. 
-Investigate whether in 2086, among patients in The 
Netherlands, there were significantly more space travel- 
ers than the 11 percent for the whole earth, with a 
signiHcancy of 1 percent. 

In a hospital, the disease is treated with the medicine 
R1 and R2. Every patient gets 600 mg of Rl and 190 mg of 
R2. Rvith medicines can be made from raw materials A and 
H. Every kg of A yields 60 mg of Rl and 10 mg of R2 and 
every kg of B yields 80 mg of R I and 1 .8 mg of R2. Compute 
the minimal numher of kg of raw material (A and R 
together! needed for one patient. 

The cost for A is SIS pea* kg and the costs lor R are 
variable. The hospital tries to get the raw material for 
minimal cost per patient. Compute for R when it is cheaper 
to make the medicines Rl and R2 from A only. |VWO 
National Examination, The Netherlands, 198*7! 
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The context is clear from the first sentences, but we have 
presented the complete problem to give an honest picture of the 
process. A first reaction to the context might he that it is artificial 
to project a century from now, to talk about a space-related disease. 
It might even appear to be an elaborated problem because of 
contrived information, such as R1 and R2 and A and B. It definitely 
does not offer the students a real-world context — apart from the 
fact that the mathematics is all too real for them. And the argument 
can be made that its relevance leaves something to be desired. 

Nevertheless, the information presented in the problem is 
scientifically grounded and based on a source article the contents 
of which are both very real and relevant. The problem was built 
around real data: The time span was 1979 to 1986, and the space 
disease was in reality AIDS. But the designers of this item felt that 
it was not a good idea to confront students under examination 
conditions with this highly emotional issue. And here we reach the 
heart of the matter. 

It seems evident that when we put so much emphasis on the 
importance or mathematics education in preparing our students to 
be intelligent and informed citizens, we have to deal with all sorts 
of real-world contexts. We have to deal with pollution and its very 
political implications. We cannot avoid politics, with its multitude 
ot subjective components. Traffic safety is an important matter in 
general, but one with a very emotional component for the many 
students aware of casualties in their families. Health is perhaps one 
of the most important issues at this time for many people. The 
fitness trend is still very strong and identified with positive bodily 
care. But to discuss cancer, Alzheimer's, heart disease, and for that 
matter the effectiveness of certain treatments — whether or not in 
relati(m to the costs of health care as a political issue — presents 
subtle and not so subtle challenges. We recall vividly an incident 
that took place in one of our experimental schools. Statistical data 
presented in a textbook problem showed an exponential growth in 
the number of abortions in different countries (excluding the 
students' own country) and those numbers (not the subject of 
abortion) were discussed. The page on which this problem occurred 
was torn out ot the bo(d< at one school because a student or 
student's family believed that abortion should not be discussed at 
school and certainly not during mathematics. 

C')ver time, we have noticed a gradual change in that attitude: 
A greater and greater number of real issues can be discussed at 
school, if we remain attentive to their emotional, psychological, 
and political aspects. However, it is also clear that at the teaching 
and learning level, we may he able to use contexts that are not 
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possible at the assessment level. We agree, for instance, that in 
1 987, to use AIDS as a context was not without risks because of the 
highly emotional and uncertain aspects of the disease at that time. 
Now, only five years later and with somewhat more knowledge 
regarding the growth or nongrowth of the disease, we can imagine 
the possibility of AIDS as a context in a classroom discussion or 
maybe even on an examination. But it is clear that there is risk in 
using real contexts on tests. 

Another test item that further illustrates the kinds of prob- 
lems we face in bringing the world into the classroom does not 
seem to deal with a volatile context, 

■ In a certain country, the national defense budget is S30 
million for 1980. The total budget for that year is S500 
million. The following year the defense budget is S3S 
million, while the total budget is S605 million. Inflation 
during the period covered by the two budgets amounted to 
10 percent. 

— You are invited to give a lecture for a pacifist society. Y ou 
want to explain that the defense budget has been decreas- 
ing during this year. Explain how to do this. 

—You are invited to lecture to a military academy. You 
want to explain that the defense budget has been increas- 
ing this year. Explain how to do this, (de Lange, 198 ' 7 ; see 
also MSEB, 1991. Used with permission.) 

This item precipitated a number of conflicts in the classroom 
originating from the basic question: Is it ethical to teach students 
!i w to be manipulative themselves rather than to show them 
examples of ''manipulation" by others? In the process of teaching 
them to recognize how data may be manipulated, is it appropriate 
to ask them to do it themselves? These questions are directly 
related to the problem of real contexts; Should we discuss such a 
• controversial matter as defense spending in a classroom situation, 
let alone use it in assessment? The following student discussion 
bears thoughtful scrutiny: 

Miiriin: 1 think you have to see it as a percentage — 30 of 
the 300 and 3S of the (SOS. 

Marc: SOO of the .^0, that's 100/6. 

Marijn: The other way around — that's 0.06. 

(Marc then calculates 3S/60S on his calculator: S.^g.l 

Marijn (says to Susan): Write that down. 

Susan (asks): What is that the answer to? 
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(Marijn tells her and then dictates the answer.) 

Servcicis: This one is really too simple. 

Marijn: Aren't we supposed to do something with the 
inflation? 

Marc: O, (expletive)! 

Servaas: If you ask me, that has nothing to do with it. 

Marc: The inflation applies to both amounts so they 
cancel each other out. 

Servaas: In the second one, it just increased from 30 to 
35. 

(Susan doesn't agree at all: The inflation lies in be* 
tween the two numbers, so you have to Hgure it out 
for the second one.) 

Marc: And you have to add 10% extra to that 605. 

Servaas: It doesn't say that. 

Marijn: But in the next one you have to do it. 

Marc: You add on the inflation, but you don't mention 
that there was any inflation, so the difference is even 
larger. 



The arguments held by the pacifist group and by the military 
academy overlap each other somewhat, making the role to be 
played by inflation rather unclear. The following four minutes 
hardly contribute to the solution. Marijn calculates 605 and 35 
backwards on a basic annual level (605, that's 100%, so you divide 
that by 1 1 and then subtract it, so that used to be 550) and then 
establishes that 3 1 .8/550 is 5.78. But Mare had already pointed this 
out in the beginning with his "canceling out." Marc does, however, 
gel an idea from Marijn's calculations: 

Marc (says to Susan): If you subtract the inflation from 
35 you get 31.8. This 31.8 is much less in relation to 
605 than 35. So you have to subtract inflation from the 
35 but not frtmi the 605. 

Susan: That sure is stupid. 

Marc: Yeah, but you have to do your best to sell it, so it 
should be O.K. to fiddle a bit. 

Susan: That's ridiculous. 

Servaas: You can't do that. 

(Marc leafs through the booklet: They are doing that 
all the time.) 

Marc: O.K. It's not all right, but if you're on the side of 
the pacifists. . . . 

Servaas: But then calculating the inllation for both. 
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(Marc asks Marijn what she thinks, but she had lost 
the thread of the conversation. He explains it once 
more but Marijn, too, rejects his solution: That's no 
longer objective.) 

Marc: But they don't have the data. 

Servaas: And at a certain point you say whatever seems 
to fit. 

Susan: I'll write it down. 

Manjn: Marc sure knows how to deal with pacifists. 

• Susan: Double points. Go ahead Marc. 

(Marc dictates his solution.) 

The discussion shows that the context is all too real for the students 
but they hesitate to use the numbers to their benefit, making the 
assumption that you have to be objective. This conversation was 
taken from a classroom where the students were working on 
solving this problem in small groups. One can imagine what would 
happen if we had given this problem as part of our assessment, 
whether as group work or individually. 

Let us turn finally to a problem within an artificial context: 

■ Somewhere out in a remote area, where people are 
rarely seen, a mysterious factory exists. Above the 
entrance hangs the sign "Cote d'Or." According to 
whispered rumors, the alchemist Ben Al-K'wasi is 
creating golden Christmas ornaments out of clay by 
means of a complicated procedure. The new orna- 
ments are made entirely of clay. After a year of matur- 
ing, they turn silver and, after one more year, they 
become true gold! If they do not break, they will 
remain gold. If an ornament breaks, it dissolves forth- 
with. In 198,^, the factory attic contained 217 clay 
ornaments, 128 silver ones, and 70 gold. By means of 
extremely uncommon earthly rays, the silver and gold 
ornaments can produce young; Two silver ornaments 
are able to produce one yomig clay ornament. In 1984, 
there were 288 clay ornaments and 98 gold ones. In’ 
I98S, there were 220 silver ornaments. E^ue to various 
causes, 7 () percent of the golden ornaments break 
yearly. 

- Based on this data, draw up the corresponding Leslie 
matrix (for a period of a year). 
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— Calculate the missing data for 1984 and 1985 (From a 
school examination in The Netherlands^ 1986). 

This problem may be successful in an appropriate classroom 
climate, where such exercises arc part of the didactical contract 
between the teacher and the students. It is, however, an almost 
perfect example of an artificial context, which may irritate stu- 
dents. In addition, with nothing to support the context, there is 
certainly no need to reflect. The imaginary context would tend to 
distract rather than support their efforts, and the teacher would 
have trouble analyzing the results. 

The DisUmce to the Students' World 

If we simply reflect on the examples provided, it will he clear that 
a valid topic for discussion is. How real is the real world for the 
student 1 The American Curriculum and Evaluation Standards for 
School Mathematics (NCTM, 1989) stresses the importance of 
motivating contexts like pop music charts and baseball. A motiva- 
tional factor may he necessary when we use a problem to introduce 
mathematical concepts, but when it comes to applications we 
cannot be too prudent in describing the boundaries of the students' 
real world. 

We mentioned previously that flying formed a real-world 
context that is very rich in relation to mathematics. Late in the 
19'70s, experiments were carried out putting simple trigonometry 
and vectors in the context of flying, aiming at lower and middle 
ability students of LV14 years of age. The experiments were 
satisfying and a booklet on them was published. Shortly thereafter 
we received complaints to the effect that putting mathematics in 
a flying context was not fair because this subject puts boys at 
greateradvantage and was unfit for girls. Although the message was 
clearly understood, it came as a complete surprise to those who 
complained that the booklet was tested at a school where the 
students were almost exclusively girls. 

But although the dilemma was clear, it still was not satisfac- 
torily resolved. Should we offer girls only female contexts in order 
to emancipate thenii Or is it much better to offer them a suppos- 
edly male context for emancipation. During the experiments, we 
tound that the context motivated boys and girls equally and that 
boys enjoyed no major advantage because ot the way the inlorma- 
tion was presented. 
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Since that time, we have solved the problem (tor the moment! 
by offering as many different contexts as possible— both during the 
classroom presentations as well as during assessment. Slowly, 
rather specific "female'" contexts are entering the tests — especially 
at the primary level, where this problem seems easier to solve than 
at higher levels. This may be caused in part by the fact that at 
primarv' level, and at lower secondary level as well, we tend to 
define the real world as the world that is really known to the 
students, or can be imagined by them. In other words, the distance 
to the students' world is close to zero. 

Several test items on the primars' level that have been selected 
at random follow: 

■ An ice cream vendor has computed that if he sells 10 ice 
creams, they will consist of: 2 cups, 2 cones, and 5 sticks. 
He orders *^00 ice creams. What distribution of the different 
kinds will he use' 

■ You need to know how much water your water barrel can 
contain. What are you going to compute- 

— the perimeter 
— ^the area 
— the volume 
— the weight 

■ Annie wants to know the area ot the island. She does that 
by putting a grid on the map ot the island. What will the 
area of this island be- 
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■ Frank runs around this sports field five times. How many 
kms did he run- 

■ A pack of papers containing .^00 sheets is cm thick. How 
thick is one sheet of paper* 

All of these problems are situated, more or less, in the 
students' daily life. Rut the quality or nature of real contexts 
changes as we advance in the educational system. We have already 
seen the different real worlds that students have to cope with if they 
are at the high school level. After observing these developments 
rather carefully in The Netherlands during the past decade, we have 
seen the following picture evolve: At the primary level, students 
are dealing with their "own” real world, including fantasy worlds. 
But at secondary level, the picture is different: In the first place, we 
notice that the students are becoming increasingly a part of the real 
world, including the scientific and political worlds. But we also see 
that, in effect, we delay this process for the lower ability groups. 
Here we stay much longer with the day-to-day real world, without 
any assurance that this is justified. At the same time, we arc seeing 
female real worlds appear in tests for the first time. The following 
is taken from the new final examination at the lower level in The 
Netherlands in 1991. 

■ Wilma's sister has joined the majorettes and therefore 
needs a circle skirt. Wilma has promised tt^ make her one. 
She has made a sketch of the pattern of the skirt and has 
indicated the measurements: waist .S6 cm, length of the 
skirt 40 cm. 

— Draw the pattern for the half circle of fabric to scale. 

Write d(uvn your calculations. 

— Wilma buys a piece oi chnh measuring 90 cm width. 1 Imv 
long should the piece hc: (round oil to 10 cm) (W12-16, 
team, 1991. Used with permission.) 
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This context, from daily life, is oriented to girls — close to their 
real world; it is not as relevant to boys. The next item, ‘meant fora little 
lower level, illustrates already a somewhat more "scientific" world: 

■ The flowers on this stem grow next to each other: 




The first flower uses of the nutrition that is irans 
ported through the stem. The second flower uses 7^ ^4 
the nutrition that is left over. The third flower uses 
‘j\ of what is left over by the second, etc. 
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— Complete the table: 



Flower # I 

Part of Nutrition 1/3 



2 3 4 

2/9 



6 

32/729 



(Boertien, National Center for Educational Evaluation, CITO, 
Arnhem, The Netherlands, 1990. Used with permission,) 

It will also be clear that not only is the context somewhat less 
close to the student, but that the context is rather artificial and that 
nothing relevant is done with the context. A similar context — the 
growth of plants — is used in a very different way in a standardized 
examination, where it is clear that, although the proximity to the 
students' world is remote, the problem may nevertheless be very 
real to them. 



0- — — : Groundwater level 



■ In a nature area, the groundwater level lies at a soil depth 
of 90 cm. Ten cm above the groundwater level, the 
moisture content of the earth is approximately 32%. 
The higher the ground above groundwater level, the 
lower is the moisture content of the earth. At 80 cm 
above groundwater level, for instance, the moisture 
content of the earth has decreased to 4%. The relation 
between the height of the groundwater level and the 
degree of moisture content is indicated by the formula: 
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Here H indicates the height above the groundwater level, 
expressed in cm, and p indicates the degree of moisture 
content; expressed in percentages. The formula can be used 
for H between 10 and 80. Draw a graph of the relation 
between H and p on the figure on the worksheet. 

The area is going to be planted with vegetation whose 
roots require a moisture content between 5 percent and 10 
percent at their maximum depth. Calculate which heights 
above groundwater level will be suitable for this. 

The maximum root depth (in cm) of a plant in this 
nature area we will call R. Give a formula where p is 
expressed in R. 

The groundwater level in this nature area is now 
raised 30 cm. The relation between the height above 
groundwater level and the degree of moisture content 
remains the same as in the earlier situation. Calculate the 
new degree of moisture content of the earth at a depth of 40 
cm (HAYO National Examination, The Netherlands, 1 9901. 

The ''distance" of the real world is a factor that we have to 
consider, together with the degree of "reality" of the context. It is 
very difficult to be sure when a context is real and when it is close 
to the student. We must bear in mind that, in general, we should try 
to use real-world contexts, but the AIDS example made clear how 
complicated this can be. On the other hand, what is real for one 
student is not necessarily real for another. This reinforces the 
importance of offering a wide variety of contexts — hardly a revolu- 
tionary conclusion, but one that in fact is difhcult to implement 
under assessjuent conditions. 



NECESSARY AND SUFFICIENT 

If we look at the previous examples, we find very few that do not 
contain all of the necessary and sufficient information. It is so 
natural to assume that we need all the information in the exercise 
and that all information we need really is out there that we hardly 
think it worthwhile to consider problems that arc not of that form. 
Even in more complex real-world problems, we can solve all 
problemsbyanalyzingcarcfullyall the information, mathematizing 
and organizing it if necessary, and using specific mathematical 
t(H>ls and techniques to solve the problem. If wc reflect for a 
moment on the problem of lecturing to the pacifist society and 



135 



\()C HANOI wmioor probu.ms ❖ 127 



military academy, we notice that after a suggestion for a quick and 
clean solution, the students were unsure because they did not use 
the information regarding inflation. They were convinced that 
something had to be wrong if they did not use all of the available 
information. 

If we teach real problem solving or, better still, if lor the most 
part we identify mathematics with problem solving, we have to bear 
in mind that usually the solutions do not come easily. In real life, we 
have to mathematize the problem and that means in the first place 
analyzing it to identify the relevant mathematics. This is a very 
difficult task and almost neglected in mathematics lessons. 

One issue is that of structuring a problem. In one or two 
previous ex miples, we have seen a structure that was provided to 
help the students get started, but it is apparent that problems can 
be rendered more complex and real by omitting this prestructuring. 
This touches immediately the essence of the paradox involving 
testing and real-world problems: How do we present problems to 
the student in such a way as to optimize his or her chance of 
successfully solving them? This is a hard question to answer in the 
standard teaching-learning situation, but even more so in a test 
situation. It is iru^reor less widely accepted that we should offer the 
students the possibility to at least start successfully. To illustrate 
this point, it suffices to analyze here some of the items presented 
that seem more complex; it will become clear how carefully the 
designers selected the initial questions. These first questions 
function more as confidence builders for students than anything 
else. A further illustration of this point is the earlier discussion on 
urea pollution of a swimming pool. 

Thus, a question of major importance is. How can we offer 
siudentsgood tests that contain problems that arc more or less real, 
as well as some guarantee that the students can make sensible 
efforts at solving them? An example that I believe illustrates such 
a problem is the following snowplow problem from the Math- 
ematical Contest in Modeling: 

■ The solid lines of the map represent paved, two-lane 
county roads in a snow removal district in Wicomico 
County, Maryland. The broken lines are stale highways. 
After a snowfall, two plow-trucks are dispatched from a 
garage that isabout4 miles west of each of the two points 
( * ) marked on the map. Hnd an efficient way to use two 
trucks to sweep snow from the county roads. Th(? trucks 
may use the state highways to access the county roads. 
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Assume that the trucks neither break down nor ^et 
stuck and that the road intersections require no special 
plowing techniques (Chernak; Kustiner, Phillips, 1990. 
Used with permission.) 




It will be immediately clear that a lot of information is lacking 
if we really want to solve this problem in a realistic way. Two 
assumptions are already fixed; there are no breakdowns and no 
special techniques at intersections. But if we look at the results that 
college undergraduates turned in, we notice that a lot of information 
is lacking that is at least as important as that provided. We note that 

— there are two state highways 

— the state highways arc clear of snow when the plows are 
at work 

— all county roads are paved, two-lane, two-way 
— no new snowfall occurs after the plows begin 
— there is clear weathei (no accidents or interference) 

— all roads need to be plowed 
— the county is Hat 
— there is no mountainous area 

— the plows may turn right or left, ami may turn around at 
intersections 

— each truck has a 60-gallon fuel tank and averages 3 mpg 
— each taick is equipped w'i th a plow blade set at an angle of 4S' 

All of these assumptions were made by one team. 

Tests or tasks with redundant information are even harder to 
find. This may seem strange because, in real life, we usually have 
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to solve problems with lots of redundant information. And some 
evidence exists that it is not only students who find problems with 
redundant information difficult to solve. One example, described 
in de Lange, Burrill, Romberg, & van Reeuwijk (1993), is called the 
rat problem, (Used with permission.) 

During inservice teacher training courses in the early 1980s, 
the following from a college textbook on biology was given upper 
secondary mathematics teachers: 

■ jl|t might be interesting to estimate the number of offspring 
produced by i)ne pair of rats under ideal conditions. The 
average number yi)ung produced at a birth is six; three out 
of those six are females. The period of gestation is 21 days,- 
lactation also lasts 21 days. However, a female may already 
conceive again during lactation; she may even conceive again 
on the very day she has dropped her yt)ung. To simplify 
matters, let the number of days between t)ne Utter and the next 
he 40. If, then, the female drops six young on the first day ot 
lanuary, she will be able to prt)duce another six 40 days later. 
The females from the first litter will he able to produce 
(jffspring themselves after a 120 days. Assuming there will 
always be three females in every litter t)f six, the total number 
of rats will be 1,808 rats by the next January 1st, including the 
original pair. . . . 

Is the conclusion that there will be 1 ,808 rats at the end of the 
year ciirrect: 

During the teacher training course, only 20 percent of the 
teachers were able to solve this problem within halt an hour. As 
they explained: "We feel we have all the tools to solve the problem, 
but we are unable to use them." 

On the other hand, nonmathematics majors (16 years old! 
who had had about a year of "new" real-world mathematics 
education at some of our experimental schools did very well on the 
problem. Results depend, of course, on the test conditions. In the 
classroom, with a limited amount of time, students find it very 
difficult to solve, or even to schematize the problem, but with no 
timelimit (for instance, having the problem as homework or a take- 
home testl, students produce fine results. This indicates that such 
process-oriented activities are not well suited for testing by means 
of time-restricted written tests. 

One girl came up with the solution in figure 4.2, which is 
surprisingly simple. 
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Figure 4.2 One solution to the rat problem. 

The schematized solution in figure 4.3 is from a teacher. The 
teachers felt the need to produce a formula and consequently came 
up with: 
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Figure 4.8 Another solution. 



A completely different approach uses graphs and matrices (which 
are part of the "new" curriculum). The following graph represents 
the growth of the rat population and the graph can be represented 
by a matrix: 
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Another possibility is to look for the nature of the growth 
process. Comparing the number of rats period by period, we find 
that the growth factor in the long run is equal to 1 .86. This leads to 
the formula 

A =44 . 1.86" ^ 

n 

We leave it to the reader to integrate and generalize the 
different solutions — an activity that is representative of higher 
level mathematization. 

The biggest problem for most teachers was to mathematize 
the problem in the first place. Here was a story with mathematical 
aspects, but it confronted them with the problems: Which part is 
relevant, and in which way arc the relevant parts connected? They 
also were very frustrated because they were unable to use the 
powerful mathematical tools at their command to solve this 
problem and were reluctant, at least initially, to accept the girl's 
solution as a proper and even beautiful solution. Later, teachers 
offeredus very elegant solutions integratingall kindsof mathemat- 
ics: matrices, graphs, Pascal's triangle, characteristic equations, 
eigenvalues, and much more. This opened up another area of 
concern — how to compare and value such different solutions. 
Many teachers were inclined to consider "mathematical" solu- 
tions superior to solutions that did not use typical "mathematical 
tools," like thegirl's solution. This clearly illustrates that the belief 
systems of teachers are harder to change than those of students — 
not to mention those of test designers. 

Everyone seems to agree that we should have more "rat 
problems," and most certainly during the teaching and learning 
process. But even then we face the matter of comparing the results. 
Is the giiTs solution really "without" mathematics, as some teach- 
ers state? Or, given its simplicity, is it actually one of the best 
solutions, as some other teachers argue? It is not at all clear, at least 
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on the ''proof" level, that the solution of the girl is the result of 
teaching her realistic mathematics. 

Here another issue emerges. How can you prove that your 
educational efforts have changed the problem-solving attitude of the 
students? Although we have very strong feelings about that aspect of 
realistic mathematics education and mounting evidence of its effec- 
tiveness, it seems also clear that it is almost impossible to produce 
hard evidence of its efficacy. Of course, one has only to look at the 
central examinations in The Netherlands to see that they have 
changed considerably and that theyare in fact testing problem-solving 
abilities. But such tests are still far from tests like the rat problem. 

The teachers in our training course questioned where the girl 
had learned to visualize the rat problem the way she did. Undoubt- 
edly, she had never encountered a similar problem, hut she was 
used to flexible use of representations and was able to transfer those 
capabilities in quite another setting. The point to he noted here is 
that the girl was not the only one with an elegant solution to the 
problem: The majority of the students in her class solved the 
problem with some kind of schema or visualization. But several 
had managed it with simple, plain language and minimal math- 
ematical notation. 

It seems clear, perhaps disappointingly clear, that we are only 
at the beginning of an exploration of this aspect of mathematics 
education and even further away from its consequences for assess- 
ment. 

An interesting example of a lack of fit between the informa- 
tion provided on a test item and the information needed for its 
solution is illustrated in a problem for grade 6: 

■ Katie bought 4(3 cents worth of nuts. June bought 8 oz. of 

nuts. Who bought more nuts? 

a. June. 

b. They each bought the same amount of nuts. 

c. Katie bought twice as much. 

d. Katie bought S oz. more oi nuts. 

e. You cannot tell which girl bought more nuts (Adapted 
from 1989 Illinois State Board of Education testing 
matetials, with permission.) 

This is interesting from different perspectives. In the first 
place, it is encouraging that an American state board ol education 
had the temerity to try such an item. It is a breakthrough to give a 
problem lacking sufficient information, certainly for grade 6. But, 
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of course, it raises certain questions, too. In the first place, what 
does it tell the teacher that 61 percent of the students have 
answered correctly: e? Or more precisely, what are we measuring 
here and how certain are we that the correct answer reflects the 
reasoning level being tested? 

The idea behind the item is certainly appealing, but the 
multiple-choice format destroys its effectiveness. Imagine that the 
item had been expanded as follows: ''Each of the following four 
answers are correct if certain assumptions are made. Describe in 
each of these four cases the necessary assumptions." 

Now we have a completely different item. The students have 
to reason, to think, to write down their reasoning. With just a slight 
alteration, we have created a test item that operationalizes higher 
order thinking skills as well as communication skills. Of course, 
we lose ease and efficiency in grading. And we might even lose 
some objectivity in grading. But we gain so much more. Look at 
some of the possible answers: 

lunc bought more nuts it the nuts are more expensive than 40 

cents for S oz. 

June and Katie bought the same amount of nuts if the nuts are 

40 cents per 8 oz. 

Katie bought twice as much as Uine if the nuts are 20 eents per 

8 oz. 

Katie bought S oz. more of nuts. she bought 1 .3 oz for 40 cents. 

Modified as suggested, this simple test item would challenge 
the student to process the information and communicate these 
processes to others. And the teacher, rewarded with real feedback 
on the level of the student, subsequently can inform parents of their 
children's progress in a more informed way than with a score from 
a meaningless multiple-choice item. 



f ORMATS Of TbSTS 
^hlltil)lc Choii c 

In constructing an achievement test to fit a desired goal, the test 
maker has a variety of item types from which to choose. Multiple- 
choice, true-false, and matching items all belong to the same 
category — selection-type items. Officially, theyhavebecomepopu- 
lar because they can be scored objectively. This means that equally 
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competent scorers can score them independently and obtain the 
same results. These equally competent scorers are usually some 
machine. And therein lies the real popularity of selection*type 
items: They can be machine scored and therefore are very cheap to 
administer. 

The rules for constructing a multiple-choice item are simple. 
A multiple-choice item will present students with a task that is 
both important and clearly understood, and one that can be an- 
swered correctly only by those who have achieved the desired 
learning (Gronlund, 1968). That this is not as simple as it seems to 
be, we all know, especially if we hold that the item should 
operationalize a specific goal. 

To show how difficult the latte’' seems to be, we give an example 
based on an item in the second IE A study (Travers ^ Westbury, 19891. 



Downsviite 




■ lohn and Mary make a trip by car. They go from Atown to 
Brocks, then to Chicity to Downsville, and back to Atown. 
The total trip is one of 190 miles (Board of Trustees, 
University of Illinois, 1989. Used w'ilh permission.) 

— What is the distance from Atown to Brocks? 

a. ?> 5 > 

b. 40 

c. 4S 

d. SS 

e. 70 

Of course, the item would be a winner if it were not for the 
multiple-choice format. Bui what is far more serious is that this 
item is meant to operationalize the goal: linear equations. So the 
test-item designer is not only the test designer but also the solution 
designer. Many educators would prefer that students solve this 
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problem in their own way, which, in this case, would seldom 
include linear equations. More serious is that the item, as illus- 
trated, is flawed — seriously flawed, because we no longer know 
what is being measured (which is not always badj, but we pretend 
that we do know (which is very bad), '"American students are very 
poor in linear equations" could be a meaningless statement if it 
were based on items like this. 

Another frequently mentionedproblem with multiple-choice 
items IS the assumption that only those who have achieved the 
desired learning can answer the question correctly, A Dutch teacher 
collected examples of student reasoning behind the answers they 
gave on a nationwide standardized multiple-choice test (Querelle, 
1984, Used with permission,) 

|0) is the solution of 

a, 3x = 3x 

b, -3x = 3x 

c, 3x = 3x + 1 

d, -3x = 3x + 1 

Most students chose a. But, fortunately, some students did 
choose b. Rudy, certainly not one of the brighter students, was 
asked to explain his choice of b: 

Why h- 

Well, because that's the only one that's 0, 

Yes, hut how come you're so sure? 

Well, that's easy. With a, you get 3\* + 3,v - 0, that's wrong. And 

with b, you get -3x f 3x = 0, so that's the one, because the others 

are wrong too. 

Let us look at a final example: 

■ The radius of a cylinder is 1 cm. The height is 2 cm. The 

cover of the cylinder is a rectangle. What is the length of 

this rectangle; 

a, 2 tc cm, 

b, 471 cm, 

c. K- cm, 

d. 2jt' cm. 
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lohn: I have a as the answer. 

Okay, let's hear it. 

I did the area, so radius multiplied by radius multiplied by 
71 , and, well . . . uhh one times one is two, so two 7t. So a is 
correct. 

Babette: I have got a as an answer too, but different again. 
I thought also area, so 1 x I x jt and that doubled for the 
height. 

The construction of a multiple-choice item that is clearly 
understood — answered correctly by those who have achieved the 
desired learning — and operationalizes a specific goal or learning 
outcome is not a simple task. Many items have flaws, and all have 
very limited value if our objective really is authentic assessment. 
At this time, the only area of application seems to be operationalizing 
the lower goals. And Travers and Westbury (1989) state, when 
discussing the second lEA study, "The construction and selection 
of items was not difficult for the lower levels of cognitive behavior- 
computation and comprehension." (This is not completely in line 
with the items as we see them,* we refer once more to the linear 
equations item.) "But," they continue, "the difficulties were pre- 
sented at the higher levels. The multiple-choice format necessi- 
tated by the scale of the study presented some opportunities as well 
as challenges." They then give an example: 




A (x-x,) (x-Xj)>0 
B (x-x,)(x-x,)<0 
C 0<x<x, 

D X > Xj 
E Nona of thesa 



y\eeording to Oldham, Russell, Weinzweig, and Ckirden (1989), 
this item ns an open-ended problem would involve readingthegraph; 
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that is, it would require comprehension behavior. If Xj < x < x, 
were included as one of the choices, the item would involve the 
computation or comprehension level of behavior. However, given 
the choices offered as solutions, one has to analyze them and 
discover the relationship they determine. This involves behavior 
on the analysis level. In this case, the graphs of the functions 
become one way of communicating certain information that must 
be matched to other descriptions of the same data. This item 
provides an example of the imaginative use of a multiple-choice 
format to yield a problem at a higher level of behavior than an open 
format would have used. This statement, however, needs further 
analysis, just like the item itself. Let us first look at it from the 
students' point of view. First, it is evident that C and D are not 
correct; if C and D had been combined in one alternative, there 
might be reason for giving such a response some consideration. 
What we sec at once — and it is a trivial matter — is that the answer 
should be x, < x < x„ if we understand the notation involved. 
How does this relate to A and B? Easy: Take 1 and 3 for x and see 
what happens in between if you take 2 for .x. For B, this yields 
(2- 11(2 - 3) <0. Done. 

Now the question is. What "behavior" has been tested? Did 
we indeed soar to the analysis level? Of course not! We were at the 
lowest level, substitution. Did we need to match certain informa- 
tion to other descriptions of the same data? Not at all. Was it an 
important question for the students? No, because we were not 
dealing with a real problem. Was is a good item? Of course not. Did 
it operationalize the intended (high! goal? Definitely not. 

In conclusion, for us the case is not complicated. There is a 
place for the multiple-choice format. But only in very limited 
contexts and only for measuring the lowest learning outcomes. For 
higher order goals and more complex processes, we need other test 
formats — the most simple being "open questions." 

(Closed) Open Questions 

Multiple-choice items are often characterized as closed questions. 
This suggests that there are open questions as well. However, we 
have to be careful with such terminology: Sometimes the question 
is open in format, but closed in nature. A pair of examples will serve 
to illustrate this point: 

■ a. How many diagonals does a rectangle have? 
b. How many diagonals does a square have? 
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■ The point (2, 3) lies in the first quadrant (II. Indicate for each 
of the following points which quadrant they occur in. 

a. (-2, 3) 

b. (2 

c. (3, 4) 

d. (-5, -6) 

These are rather extreme examples of so-called short-answer 
questions, a subdivision of open questions. Although they arc, 
technically speaking, open questions, they are very closed by 
nature. The respondent has to answer by a number, a yes or no, a 
definition, or maybe a simple graph or formula. Hardly any think- 
ing or reflection is involved. 

The distinction between (closed) open questions and (open) 
open questions is admittedly rather arbitrary, but this docs not 
mean that we should not pay attention to it when designing tests. 

(Open) Open Questions 

In our view, an (open) open question differs from the (closed) open 
question with respect to the activities involved in obtaining an 
answer. This answer can still be just a number or formula, but the 
process of obtaining it is slightly more complicated or involves higher 
order activities. This category differs from the next — that is, extended- 
response open questions — in that in the latter category, we expect the 
students to explain their reasoning process as part of their answer. To 
illustrate the difference, we submit the following examples: 

■ A fruit vendor buys a box of 6 pineapples for 60 cents. 
X pineapples are defective and therefore unsalable, but 
the rest he sells at 1 8 cents each. Write a formula for his 
profit, P cents. 

■ A 400 m running track is to have two parallel straights 
of 80 m each and two semicircular ends. What should be 
the radius of the semicircles? 

■ The length of the equator is about 40,000 km. 

— How many kilometers is it from the North Pole to the 
equator, measured at the surface of the globe? 

— Imagine you can travel right through the earth. How far 
is it, in that case, from the North Pole to the equator? 
(MAVO, National Examination, The Netherlands, 1991 ). 
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Northpole 




The differences between the examples given in this category 
and the previous (closed) category are clear. The (closed) open 
questions referred to basic facts and knowledge: a definition, a 
simple drawing, a substitution involving neither process nor think- 
ing. The (open) open questions, although also requiring a short 
answer, were not just triggering questions, but questions that 
needed some thought and understanding and offered some possi- 
bilities for the student to solve the problem in his or her own way. 

Some of the questions on the National Examination involve 
not only some reasoning, but the students have to explain their 
reasoning. This ''writing down" part can be regarded as a separate 
goal in mathematics education, and it is a very valid one. If we stick 
to short-answer questions, we are not able to operationalize the 
communication goals. Extended-response open questions offer 
that possibility. 

Extended-Response Open Questions 

We offer an example from a national examination that represents 
the extended-response category, but with even less freedom than 
the fish farmers problem on page 106-107. It shows clearly how 
many different test formats we actually have at our disposal. 

In fall, the grapes that are ripe arc harvested. The taste of the 
grapes depends to a great extent on the moment that they 
are harvested. If they can be left out in the sun a little 
longer, the taste will improve. But if one waits too long 
with the grape harvest, there is a chance of damage caused 
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by heavy and lengthy rainfall. A grape farmer has the 
following choices for harvesting: 

I. Immediate harvest — 

The quality is ''reasonable/' Half of the harvest can be 
sold for direct consumption; the proceeds in this case are 
$2 per kilo. The other half of the harvest can be used only 
for grape juice; the proceeds for this part are $1.30 per 
kilo. With these options, there is no risk involved in the 
harvesting. 

II. Wait two weeks before harvesting — 

The quality of the grapes in this case is "good." The 
complete harvest can be sold for $2.30 per kilo. But to 
wait two weeks involves some risk. If it rains more 
than two days during these fourteen days, the grapes 
will be so damaged that they can be used only for grape 
juice; in this case, the proceeds are only $ 1.30 per kilo. 

The grape farmer can be sure of a harvest of 1 2,000 kilos. He 
chooses the risky second way of harvesting. Compute the 
advantages and the disadvantages that he faces compared 
with the first strategy. 

Meteorologists computed that for every day in this 
two week period, the chance for rain is IS percent. Com- 
pute the chance (probability) that it will rain in this two 
week period on more than two days. 

The farmer chooses that way of harvesting which 
promises the largest expected yield. Which strategy will he 
choose? Illustrate your answer with a computation (HAVO, 
National Examination, The Netherlands, 1 989. Used with 
permission.) 

Properly constructed open questions, with a variety of 
short, long, and extended responses, offer possibilities for assess- 
ment at a level that is above the lowest level — whatever name 
we give to the lower levels. They may be called knowledge 
outcomes, a variety of intellectual skills and abilities, compu- 
tation and comprehension, or basic skills and facts. Whatever 
the terminology used, it is generally agreed that we need other 
instruments like essay tests that provide a freedom ol response, 
which is required for measuring complex or higher order learn- 
ing outcomes. 
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An old-fashioned but seldom used tool in mathematics education 
is the essay test. As Gronlund (1968) stated: "Essay tests are 
inefficient for measuring knowledge outcomes, but they provide a 
freedom of response which is needed for measuring complex 
outcomes. These include the ability to create, to organize, to 
integrate, to express and similar behaviors that call for the produc- 
tion and synthesis of ideas/' 

The most notable characteristic of the essay test is the 
freedom of response it provides. The student is asked a questiem 
that requires producing one's own answer. The essay question 
places a premium on the ability to produce, integrate, and express 
ideas. The shortcomings of the essay task are equally well known: 
It offers only a limited sampling of achievement, the writing ability 
tends to influence the quality of the answer, and the essays are hard 
to score in an objective way. 

Essays can come very close to being extended-response open 
questions, especially in mathematics. The snowplow problem 
could be considered an example of an essay, which brings us 
immediately to an often-mentioned aspect of the essay: Should it 
be done at school or at home? Usually the essay task is seen as a 
take-home task. However, this is not necessary; one can readily 
think of simpler essay problems that could be done at school but 
require a day (or so). An example given at some fifty schools in The 
Netherlands follows. 
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On the map, Amsterdam Street and Utrecht Street are the 
main streets with a maximum speed of 30 mph. At the 
intersections with Flamingo, Penguin, and Seagull Avenues 
are traffic lights. To get a smooth as possible traffic flow and 
to minimize irritation, it is considered important that the 
waiting time for red lights be minimized. If a driver has a 
green light, he or she should catch a green light when arriving 
at the next intersection, and so on for the next light. In this 
case we talk of a "'green wave." To get as close as possible 
to a green wave situation, one has to consider for instance 

— the duration of the green, yellow, and red light 
— the relation between the different intersections 
— the indication of advised travel speeds 

It is known that a traffic light cannot be red for more than 
a 90-second interval and that each green period should last 
a minimum of 5 seconds. On average, the traffic on the 
main streets is four times as intensive as on the avenues. 
How would you adjust the traffic lights for each of the 
following more complex situations? 

1. Bicyclists on Utrecht Street from north to south should 
have a green wave. 

2. Both bicyclists and car drivers should have a green wave 
from north to south on Utrecht Street. 

3. Both north and south traffic (cars and bicycles) on Utrecht 
Street should have a green wave as often as possible. 

4 . Both north and south traffic (cars and bicycles) on Utrecht 
and Amsterdam Streets should have a green wave as 
much as possible. Also take into consideration the 
traffic on Flamingo, Penguin, and Seagull Avenues: This 
traffic should have a reasonable flow too. 

What basic principles and considerations should you use in 
general and in more complex situations ? ( National A-lympiade, 
The Netherlands, 1990. Used with permission.) 

This example was meant for students at the upper secondary 
level, non-mathematics majors. The students worked in groups of 
four from 9 a.m. to 4 p.m. and were able to complete the task in a 
reasonable way. Note that the final question especially is an essaylike 
question; the other four are more or less extended-response open 
questions to make sure that the students get started in the first place. 
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Tasks like this can also be given individual students, at home or at 
school, depending on the goal that has to be measured. 

It is generally accepted that the more precise or ''closed'' the 
questions are, the more objective is the scoring. Viewing this item 
from this perspective, one is tempted to conclude that this task can be 
scored with reasonable objectivity or better in a good intersubjective 
way. By this we mean that the scoring is done by two or more teachers 
independently. Although this may not be feasible practice for routine 
classroom teaching, it might be tried periodically with a fellow 
teacher. Essay tasks are typically not feasible for routine classroom 
teaching, so the incidental essay item might prove to be an excellent 
tool for creating interaction at the content level betv/een teachers — an 
activity that is vital to good teaching at any school or institute. 

It is interesting to compare this rather open problem with one 
that presents a similar context and problem but is meant for a 
timed-test situation at about the same age level and has to be 
completed within half an hour: 

■ Here you see a crossroads in Geldrop, The Netherlands, 
near the Great Church. 



direction N(uenen) 



direction E(indhovcn) 




direction M(icrk>) 

_j5 



churcb 



direetkn C(enter Geldrop) 




To let the traffic flow as smoothly as possible, the traffic 
lights have been regulated to avoid rush-hour traffic jams. 
A count showed the following number of vehicles had to 
pass thiough the crossroads during rush hour (per hour): 
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The matrices G^, and G, show which directions 

have a green light and for how long. V' means that traffic 
can ride through a green light for a period of V' minute. 





— How many cars come from the direction of Eindhoven 
(El during that one hour? And how many travel toward 
the city center? 

— How much time is needed for all lights to turn green 
exactly once? 

—Determine G = G, + G, + G\ + G, and thereafter T - 
30G. What do the elements of T signify? 

— Ten cars per minute can pass through the green light. 
Show in a matrix the maximum number of cars that 
can pass in each direction in one hour. 

— Compare this matrix to matrix A. Arc the traffic lights 
regulated accurately? If not, can you make another 
matrix G in which traffic can pass more smoothly? 
(van dcr Kooij, 1989). 

It will be clear that we can easily turn this open question 
problem into an essay by simply asking the final question. 

Essay tasks have long been neglected in mathematics educa* 
tion but certainly offer possibilities if carefully designed and if they 
can be evaluated in an appropriate way. Another classical tool that 
we use often as part of the teaching-learning process is the oral 
presentation or discussion. 
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Oral Tasks 

In The Netherlands, oral examinations at the national level have a 
long tradition. Until 1974, an oral component was part of the 
national examination. Oral tasks still constitute one part of the 
examination and may be used by individual schools to constitute 
as much as 50 percent of the final score. There are many different 
forms, of which we cite three: 

■ an oral discussion on certain mathematical subjects that 
are known to the students in advance, 

■ an oral discussion on a subject — covering a take-home 
task — that is given to the students for study twenty 
minutes prior to that discussion, 

■ an oral discussion on a take-home task (or similar 
alternative task) after the task had been completed by 
the student (and scored by the teacher). 

A global inventory of oral tasks at Dutch schools during the 
1980s shows that the oral task was almost exclusively used to 
operationalize process goals. The solution, or product, per sc was 
not important, and in many cases the examiners stopped the 
discussion short of the actual solution of the problem. 

It has sometimes been argued that the students who perform 
well on restricted-time written tests do equally well on oral tasks. 
This would suggest that there is no point in going through the 
elaborate process of doing oral tasks. It appears that this effect is due 
partly to the fact that similar questions are asked on written and 
oral tasks. In this case, it can hardly come as a surprise that the 
results have a high degree of correlation. 

We did some small studies that compared the results of 
different groups of students regarding the correlation between 
restricted-time written tests and oral tests — dealing, of course, 
with the same subject matter. One detailed study focused on two 
classes (twelfth grade) with t wenty-eight students and two teachers 
who practiced intersubjecti ve scoring. The correlation between the 
different results was 0.42, which seems to suggest that different 
learning outcomes were tested with the different formats (de 
Lange, 1987). At other schools, we carried out a similar study, 
although in less detail, and obtained a remarkably similar result: 
The correlation was 0.47^ with seventy-three students involved. 

It is not difficult to find protocols that ''prove" that one can 
get notably different results in oral tasks, compared with written 
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timed tests. Almost every teacher will have had that experience at 
some time in his or her classroom. It is also interesting to compare 
students' reactions to oral tasks: The results are sometimes surpris- 
ing. One usually low-achieving girl did surprisingly well on oral 
tasks, surprising both the teacher and the student, who com- 
mented, "I am in favor of including oral tasks. When I get stuck 
with some detail in a restricted-time written test I usually get very 
nervous. This makes it impossible to solve the rest of the problem 
successfully. In this way I can try to prove th.it I really do under- 
stand the subject matter." Another student who was excellent in 
mathematics, according to his teacher and the results of written 
tests, was incapable of interpreting the results of a matrix multipli- 
cation during an oral test. This student said afterwards, "I am 
personally not really in favor of oral examinations. In my opinion, 
time is too restricted (total time per student was 20 minutes). There 
was not enough time to think things over because keeping your 
mouth shut may give the impression that you are trying to stretch 
time. Another drawback is the fact that you cannot do much 
computation. Finally, I don't like people watching and observing 
me. ..." 

The general conclusion of a larger group of students involved 
in oral tasks was that positive aspects outweighed negative aspects. 
Positive aspects of oral tasks that they mentioned explicitly were 
these: more questions on insight and theory and on ma thematizing, 
less on arithmetic and computation; good atmosphere makes you 
feel more relaxed, because of hints you will never get “stuck"; not 
much attention to details, but more general questions. Negative 
points mentioned were pressure due to lack of time, pressure due 
to presence of officials, and little or no computation. 

It will be clear from the preceding that oral tasks offer 
excellent opportunities for testing, but that they need careful 
preparation. The problems with oral tasks can easily be underesti- 
mated; one is tempted to take into consideration the perceived 
level of the student, ask questions similar to those on the written 
tasks, allow too little time, give help that is misleading (the student 
may mentally be on a track other than he one the teacher 
presumes). The teacher needs to take into account the fact that he 
or she has to spend much more time on oral tasks than the time 
spent with the student. 

A completely different oral task than that just described, 
which wc can categorize as "discussion" between the examiner 
and the student, is the "presentation" task. In this case, the student 
is asked to prepare a presentation about a subject to be discussed 
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with the teacher. Those students planning to go on to higher 
education need some experience in presentation on mathematical 
subjects. Again, this area of learning and assessment is still, for the 
most part, undeveloped. The format can be employed very success- 
fully if we include a discussion by the other students regarding the 
presentation, especially if we let the students “grade" the presen- 
tation: What are the strong and weak points and why are they stror ^ 
or weak? Such discussion will in most cases lead to reflection by 
both the presenter and the teacher, as well as the audience, about 
the subject. This is a format that needs further exploration. 

Two-Stage Tasks 

Any combination of different test formats may be termed two-stage 
tasks. An oral task on the same subject as an earlier written task is 
a typical example. Some years ago we explored two-stage tasks that 
basically consisted of one task carried out in two stages. The 
characteristics of this restricted two-stage task combine the advan- 
tages of the traditional restricted-time written tests and the possi- 
bilities that more open tasks offer. The characteristics of the first 
stage are that all students are administered the same test at the 
same time; all students must complete it within a fixed ume limit; 
the test is oriented more toward finding out what students do not 
know than what they do know; usually the tests operationalize the 
lower goals, such as computation and comprehension; it consists of 
open questions; and scores are as objective as possible, given the 
fact that we exclude the multiple-choice format. These are then the 
characteristics of the first stage of the task. 

The second stage should complement the first and address 
what we miss in the first stage, as well as the goals we really want 
to operationalize. The characteristics of the second stage are the 
following: The test has no time limit; the test is done at home; the 
test emphasizes what the student knows, rather than what he or 
she does not know; much attention is given to the operationalization 
of higher goals, such as interpretation, reflection, and communica- 
tion; the structure of the test includes more extended-response 
open questions and essay type questions; and scoring can be 
difficult — intersubjectivity in grading should be stressed. 

The test, as indicated earlier, consists of a problem with a 
number of questions. An example was the forester's test that we 
presented in part on page 108. The complete text is given to the 
students during a restricted-time session at school. They then have 
to read the entire test (around three pages) and decide which 
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questions can be succevssfully completed in the classroom that day. In 
principle, they are free to tackle any of the questions but it came as no 
surprise that most students did in fact choose the open questions with 
short or long answers and left the essay questions for later. They were 
helped in making this dec j sion by the fact that the first questions were 
also the more simple questions. When the bell rang, the students 
were asked to hand in their results, which, of course, were incom- 
plete. When the scored tests were handed back to the students a week 
later, the scores were disclosed as well as the larger mistakes. 

Now the second stage takes place. Given the teacher's feed- 
back on the work they did, each student repeats and completes the 
work at home — without restrictions and completely free to re- 
spond to the questions as he or she chooses, whether one after the 
other, by way of an essay, or by a combination of these. After a 
certain interval, of perhaps three weeks, the students have to turn 
in the work again, and again scoring takes place. This sequence 
provides the teacher with two marks; a first stage (lower goals? ) and 
a second stage (higher goals?) grade. 

A total of thirteen questions on the test represented the 
formats as follows: 

• open question or short answer: questions 1 and 2 

• open question or long answer: questions 2, 3, 4, ,S, 6, 7, 8, 
and 10 

• essay or restricted response: questions .S, 8, 9, 10, II, 12, 
and 13 

• essay or extended response: questions 12 and 13. 

Looking at this classification, one might expect students to 
handle successfully the first eight questions during the first stage. 
The remaining questions seem more appropriate for the second 
stage, at home. This was exactly what happened when this test was 
given its first trial in The Netherlands in two groups of twenty 
students each. The first seven questions were successfully answered 
during the first stage by more than 7S percent of the students; 
question S by between SO and 7S percent, and the remaining 
questions were successfully handled by less than SO percent of the 
students (with less than 2S percent on questions 9, 12, and 131. 

It is not easy to convey a proper impression of the quality of 
the students' production in the second stage. The variety in their 
responses was enormous. Certain students completed some tasks 
in a straightforward way, answering the questions and paying no 
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attention whatsoever to layout and related topics. Others turned in 
a veritable booklet, with color illustrations, self-made computer 
software, and typed or word-processed answers. Most students 
followed the order of the questions strictly and did not stray too far 
from their content. Some wrote an essay in which all of the 
questions were answered, and a number of students took the 
opportunity to show their own creativity in content-related ques- 
tions. 

Looking back at the results — as expressed both in scores and 
in student comments — we note the following; 

1. There was a relatively wide spread in scores for the first 
stage — from very poor to excellent. In the second stage the 
students performed much better. 

2. Girls' performance was relatively poorer than boys' in the 
first stage. At the second stage, this difference almost 
disappeared. In fact, the best results were from girls. 
Intersubjective scoring was a satisfactory way to score the 
second stage. 

The task of evaluating the second stage was given to thirteen 
teachers who did not know the students nor their results for the 
first stage; their marks for eleven students differed no more than 10 
points on a 100-point scale. 

Production Tesh 

If one of our principles is that the testing should be positive, which 
means, first, that we should offer students the opportunity to use 
their abilities, and, second, that tests arc part of the learning- 
teaching process, the students' own productions offer fine possi- 
bilities for achieving our purpose. The idea of own productions is 
not really new. Reports on experiences with this form of evaluation 
go back a long time. Treffers ( 1 9S7] has introduced the distinction 
between construction and production, which is not a matter of 
principle. Rather, free production is regarded as the most promising 
way in which constructions may be expressed. By constructions, 
we mean (1) solving relatively open problems that elicit diverse 
forms of production, due to the great variety of solutions they 
admit, otten at various levels of mathematization, and (2) solving 
incomplete prcdilems that, before they can be solved, require the 
student to supply data and references. 
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The construction space for free productions might be even 
more extensive and include contriving one's own problems (easy, 
moderate, difficult) as a test paper or as a problem book about a 
theme or a course, authored to serve the next cohort of pupils 
jStreefland, 1 990). For example, ''Think of as many sums as you can 
with the result 3, or 5" (grade 1). 

Grossman reported ( 1975) on the unexpected results achieved 
by this production task. She presents a few examples of work with 
first graders. We quote two and include the teachers' comments: 
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Figure 4.4 The response of two students to a free productions problem. 
(Streefland, 19901. Used with permission. 



I knew Jon was bright because he understood so well all that I 
taught in my structured lessons, whether I followed the sylla- 
bus or went just a little beyond it. However, I never suspected 
that he could handle number combinations in hundreds and 
thousands. There I was, teaching combinations up to 20, 
limiting my expectations and the children's ceilings. 

Mark was having trouble with arithmetic until I gave this 
assignment. He mazed me and he proved to himself that not 
only he could do arithmetic but that he couldn't stop doing it. 
(He handed in two extra papers on his own on subsequent 
days.) The other children loved the activity, too. My feeling 
was one of constant amazement that they could do it at all. 
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Streefland remarks that the teachers' comments show that 
both boys had amply exceeded the limits of the scholastic domain. 
Mark's work still reveals traces that show how he reflected on his 
activities. After a hesitant start, when he scanned the available 
arithmetic, he screwed up his courage, became conscious of self, 
wrote bigger, and sailed a fixed course through the system he built 
while constructing the problems. He transcended the boundaries of 
the arithmetic lesson and produced his own structure. At home, he 
continued intensively — the same Mark who was supposed to have 
problems with arithmetic. Jon must have seemed even more 
limited in his possibilities; yet, he anticipated a level of sums that 
were three grades higher in the curriculum — up to 10,000 - 9,995 
= 5. Like Mark, he worked systematically. Only in his written 
report was he a bit untidy. 

Both of the boys reflected on what they had learned within the 
number system, and consequently they anticipated a future level 
of the teaching-learning process. A test item presented earlier — 
make as many sums as you can that equal 100 (page 104| — shows 
clearly that, on the one hand, we have an assessment tool and. on 
the other, we can use that tool for the learning process— a very 
important aspect for proper assessment. 

We next address in some detail what took place in an 
American ninth grade on the subject of data visualization. The 
students had been working — suffering, interacting, thinking, 
discussing — for two weeks on a text designed in The Nether- 
lands and based on the philosophy of realistic mathematics 
education. It is a philosophy, we believe, that fits reasonably 
well with the philosophy of the NCTM Standards. The Data 
Visualization booklet (de Lange Verhage, 1992) was intended 
for about five weeks of class activities. 

‘ At the end of their second week, the students got their final 
test; the task was a simple one: 

■ At this moment, you have worked your way through the 
first two chapters of the book and completed a relatively 
ordinary test. This task is quite different: Design a test for 
your fellow students that covers the whole booklet. You 
can make your preparations from this point on. Look at 
magazines, books, and other available sources for data, 
charts, and graphs that you want to use. Write down ideas 
that come up during school lime. After finishing the last 
lesson of this booklet, you will have another three weeks to 
design your test. Keep in mind: 
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1. The test should be taken in one hour. 

2. You should know all the answers. 

It is tempting to show many exciting examples of the stu- 
dents' production. To be honest, there were disappointments as 
well. One student simply took a reasonably well-chosen collection 
of exercises from the booklet, avoiding any risk or creativity. The 
next ''higher'" design strategy is the one that mathematics teachers 
often use: Take examples from the textbook, make small changes 
(in exponents, coefficients, or maybe context), and your test is 
ready. This worked for some students as well, although the answers 
sometimes made it painfully clear that the author was not among 
the brightest students. 

Our first example shows an exercise that operationalizes the 
lower level: 

■ The following are the top 20 point leaders of the Edmonton 
Oiler team. Draw a stem-and-leaf diagram. 



WG 208 


DL32 


IK 185 


KL 26 


PC 121 


PH 25 


MK 88 


KMC 23 


GA81 


RG 23 ■ 


MN 63 


DI 20 


MM 54 


DS 18 


CH 51 


LF 18 


DH 36 


BC 17 


WL32 


IP 12 



What is the average for the 20 pieces ot data* 
(4.h^l 

What is the median; 



What is the difference between the average and the median- 
(22, IS) 

Why is the average higher than the median*" 

(Some people had extremely high points. 1 

The next example is interesting because it tackles the 
problem of misrepresentation that often occurs in pictographs. 
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Two- or three-dimensional objects are used to represent one- 
dimensional facts. The subject, 'Tair'' or "'honest'' graphs, got 
some attention in the booklet. One of the students created the 
following exercise: 



Docs B or C show the volume of A doubled!’ 

Which cylinder, B or C, has the same volume as A\ Why? 
(de Lange, Burrill, Romberg, ^ van Rccuwijk, 1993. Used 
with permission.! 

"'Understanding Graphs" is a title that we took from one of 
the student tests. So far, wc were mainly seeing rather straight- 
forward questions about averages, means, and histograms, box 
plots and stem-and-leaf diagrams, and scatierplots. Some of the 
questions posed by the students already showed a tendency 
toward somewhat higher order thinking skills and some of them 
were surprisingly open. However, most of the questions dis- 
cussed next are more about data visualization, As with the 
previous examples, we will not really discuss the problems but 
merely present them. It is well worth the effort to try to answer 
the questions and let the reader judge the quality of the pr(')blems 
designed by the students. 

■ The graph shown here is a bar graph involving information 
about how much money is being spent on military supplies 
compared to the country's GNP. 
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Is this information easy to read and understandable? Why 
or why not? 

(No, this information is not easy to read because the 
numbers on the left have no meaning. One cannot tell if 
they mean millions, billions, etc.) 

Could this information be represented better? Explain. 

(No, because a line graph would not work, a box plot, pie 
graph, stem leaf, etc.) 

Is this graph accurate? Explain your answer. 

(No, because this graph is based on an average between 
1960-1 979. l(de Lange, Burrill, Romberg, van Reeuwijk, 
1993. Used with permission.! 

The following example shows clearly that the student under- 
stood that the booklet was more than just an introduction to mean, 
median, box plots, and histograms. 

■ In 1968, the United States was first and the .Soviet Union 
was second in machine tool production. In 1 988, Japan was 
first and West Germany was second. Compare the differ- 
ence in dollars between the two sets. 



1 G 3 
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(The difference in dollars in 1968 is roughly $3 billion and 
in 1988, it's S2 billion. This could mean that in 1988, the 
amount of the production of machine tools is more evenly 
spread.) 



The machine tool collapse 

us dropped from globe dominance 
to fifth place In two decades g 

MACHINE TOOL PRODUCTION 
IN BILLIONS OF 1988 DOLLARS 

rr^$6.8 




SOVIET WEST JAPAN 
UNION GERMANY 



ITALY 



Why is data like this collected? 

(It is collected to learn who is the leading producer of 
machine tools. For the United States, it is also helpful in 
that we know what to improve in our economy.) 

Would a line graph represent this data better? 

(No, I don't think so, because line graphs are primarily used 
to show sudden changes in connected things. A bar graph 
is better for this information because each country is 
separate and not connected in any specific way besides 
machine tool production.) 

What is the average for each of the sets separately and 
combined? 

(For both of the sets, the average is; about S3. 92 billion. It's 
only an approximation because you can't tell the exact 
numbers from the 1968 bars. For 1968 alone, it's about 
$2.74 billion. For 1988 alone, the numbers average $5.02 
billion.) 
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Is there any way to get the averages besides computing? 
(Yes, I think so because if you look at the length of the 
bars themselves and not the actual numbers, you can tell 
pretty closely for each of the individual sets. For both of 
the sets combined, it gets a little harder because you 
have to balance them on an X sort of thing.) (de Lange, 
Burrill, Romberg, ^ van Rceuwijk, 1993. Used with 
permission.) 

It will be clear from these examples that the students' own produc- 
tions offer excellent possibilities for improvement for the teaching- 
learning process, as well as for assessment. It also goes without 
saying that there are difficulties and problems to overcome regard- 
ing the design, implementation, and scoring of tasks like those 
described. 



f uif^mcntcd Intonihition Rt\isoninf^ Test 



A special kind of assessment was recently developed at the 
Freudenthal Institute to tackle a specific need in mathematics 
education. Mathematics in school has veered in the direction of 
increased applications, mathematization, and reasoning to use 
mathematical applications in other disciplines. College-bound 
students should be able to find relevant data from redundant 
information and should be able to cope with information or data 
that are lacking. They should be able to formulate a hypothesis 
and support this hypothesis in the best way possible, given the 
scant information. How does fact A relate to fact This is a 
distinctly different activity than to give a mathematical proof. 
Proofs are relatively easy, certainly at secondary school level — if 
we teach the students any real proof at all, which means going 
beyond Pythagoras. 

As we can see, too often *^he reasoning in many scientific 
publications leaves something to be desired. Especially in social 
and life sciences, "commonse.nse" logical reasoning is often lack- 
ing or incomplete. An effort ie develop in students a capacity for 
this commonsense logical reasoning on a scientific level should be 
part of the mathematies curriculum. But how do we assess it? In 
response to this question we initiated the process that resulted in 
the first modest attempt to operationalize the goal of assessing 
reasoning. 
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Our main impetus for developing a means of assessing reason- 
ing came from information presented in a scientific article entitled 
"The Importance of Aquatic Resources to Mesolithic Man at Island 
Sites in Denmark/' by Noe-Nygaard (1982). This article had al- 
ready been "translated" mathematically to a lower level becaUvSe of 
the interesting use of secondary school level mathematics |de 
Lange, 1984). 

The construction of the assessment instrument took the 
following course. We tried first to identify relevant information in 
the original article. Actually, we divided the article into several 
parts. We added other data; on the one hand, to give students some 
background, on the other, to make the information redundant. This 
resulted in twelve sets of information or data on twelve different 
sheets of paper. And the task for the four students involved was to 
support or reject the following hypothesis: "During the Mesolithic 
period, people from mid-Europe went farther north in spring and 
summer than previously assumed. Recent research leads to the 
conclusion that they did not stop in mid-Germany but went as far 
north as Denmark." 

The twelve information sheets were very different in na- 
ture — one, for instance, simply provided data from an encyclopedia 
regarding the Mesolithic period; another, a map of Denmark; and 
the next taken almost directly from the original article; "Excava- 
tions near the now-vanished Lake Preastelyngen in Denmark 
resulted in finding numerous remains of fish skeletons. In total, 
around 5,000 parts were found. Scientific data methods have shown 
that these remains date from the end of the Mesolithic. Scientists 
have tried to identify which kind of fish were involved. According 
to these sources, surprisingly, many of the bones were from pike — 
around 80 percent of the bones. Pike was a popular fish at those 
times. The rest was prey fish like tench and perch." 

It v/as expected that the students might get very confused 
when confronted with this test and might not know where to start. 
So a couple of activities were included at the lower level to make 
the format more useful. The original article offered ample opportu- 
nities for getting started, because not all of the graphic representa- 
tions were without flaws. This resulted in the following page; 

Because growing >s a evelie pioeess . . . eireular graphs are 
sometimes used u) describe growth processes. The researchers 
at Praestelyngen Lake designed ihe folloning graphs that 
represent pan of their findings: 
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Crdph of the time of death of the pike at Lake Praestelyn^en (in 
pcrcentaycsj. 



Some critical remarks can be made about the correctness of 
these graphs. 

Study thegraphs, keepingin mind the following questions: 

How would the first graph have looked if the pike had all 
died in the period March-June? Do you get the same area for the 
dotted area? 

And how about the second graph: Does the area for the 
month of May really represent the number of deaths in May? 

Also give suggestions for better graphs and use those (if 
you like! in your work, (de Lange, for the National A-lympiadc, 

1992. Used with permission.) 

The task was intended as a group task in a restricted time period. 
At the time of writing, only two experiments had taken place, each 
giving the students six hours. 

The videotape of the first experiment indicates that all of the 
students operated in a similar way: First, they each read the 
materials and, after a short reflection on the nature of the task, they 
tackled the individual pages, starting with the page of graphs just 
shown. In this way, they slowly grew into the task. Next, they tried 
to glean the relevant information from each of the twelve pages. 
This was followed by a key activity: the discussion about whether 
or not to accept the hypothesis. It was very interesting to see a group 
of four male teacher-students work all morning resulting in a firm 
decision to reject the hypothesis. After a short lunch break, they 
decidedly chose that direction. One student took a more careful 
look at an atlas to get a clear idea of the geographical position of the 
places involved. After at least half an hour, the student at the atlas 
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rose and told the other students with much confidence: 'They have 
been there!" An interesting discussion now took place as to 
whether they should change and support the hypothesis. They did 
change. And here we have an essential aspect of this kind of test: 
There should be room for two directions. The students can either 
support or reject the hypothesis. 

We are still far from reaching firm conclusions about this test 
format. But indications are that it could be promising: The students 
reason in a way that we seldom see: They have to look at details, 
discuss which position to take, write and communicate at an 
appropriate level, design and use adequate illustrations, and at the 
same time, carry out a substantial number of activities at lower 
levels. One question to be explored is whether we can design 
similar activities at different levels. 

Project Work 

Project work has had supporters for a long time, especially in the 
United Kingdom. However, it is still unclear what a project really 
is. Several definitions have been used, including the following: "A 
project should be a well-balanced and well-presented piece of work, 
showing completeness and depth of study, and offering genuine 
mathematical involvement with the topic" (Allen, 1977). As a 
result of the ambiguities in some of the terms used in this defini- 
tion, a more formal definition of a project cannot have much value. 

An essential question is Why should we have projects at all: 
This is even more relevant because many tests do not meet our 
standard that a test be practical: An appropriate project is not only 
difficult to design (like many of the formats just described), but^t 
is also hard to organize and carry out. Allen (1977) gives several 
arguments in favor of project work: In the process of completing a 
project students should be extending their experience in either the 
collection of information or learning techniques to handle informa- 
tion. He adds that naturally we cannot specify the precise knowl- 
edge to be gained in the form of objectives because of the potential 
variety of subject matter open to student selection. 

Another goal of project work can be the original application of 
a technique or a number of techniques that require the student to 
reflect upon the structure of the problem. Me or she will also have 
to consider alternative models and finally make some kind of 
evaluation. So it seems there is little question but that the ideal 
situation should operationalize lower, middle, and higher level 
goals, irrespective of the taxonomy we use. Another important 
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issue is whether student involvement in the proiect will produce 
results different from those expected in other settings. On the basis 
of our research, we know that we do indeed measure goals in project 
work that are different from goals in traditional course work. Allen 
reports a correlation around 0.44, which seems to indicate that 
project work results are more or less independent of the more 
traditional assessment work. 

Proiect work can be done in groups, and usually is, but it 
also can be done individually. It can be carried out according to 
a strict scenario or in complete freedom. It can be rather short, 
or it can be a lengthy investigation. Subjects for investigation are 
abundant: the adjustment of traffic lights (an example used 
earlier), determining which supermarket is cheapest, figuring 
out how to keep records so that a store never has too much or too 
little in stock, tracking pupil and student movement in school, 
placement of new buildings in a town layout, determining the 
pollution of lakes or air, finding the blind spots in traffic control, 
making maps and projections, and in many data-collecting 
activities. 

It is difficult to describe a typical project activity. A structured 
example may require an entire week of work on a special book or 
subject that otherwise never would have been treated. When 
students have an entire week in which to integrate mathematics 
with other information, their freedom is somewhat limited, hut the 
result for both the students and the teacher is much more easily 
judged than in projects that leave them greater freedom. 

At this time, it is unclear how we can best assess project work. 
There is little doubt that well-designed project work (which means, 
in most cases, small piojects) will have beneHcial cl' ccts for the 
teaching and learning process. However, we cannot be certain 
whether project work measures this process more effectively, or 
whether it measures different dimensions of it, compared with 
some of other formats we have described. Project work should be 
considered as a viable assessment tool only if the other formats 
described, which are usually easier to execute, do not operational ize 
the goals we wish to test. 

Portfolios 

In the United State\s, much attention has recently been devoted to 
portfolio assessment in mathematics. Portfolios may seem compa- 
rable to projects, but are actually quite diiferent . Murphy and Smith 
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(1990), in describing the use ot portfolios in the teaching and 
learning of writing, state the following: "Portfolios are obviously 
more than a collection of artifacts. They are, even before the 
unfilled folders are home from the store, a reason for talking. And, 
depending on the way the talk goes, the portfolio can take many 
different forms. In a sense, coming up with a portfolio project is like- 
choosing what to teach. The decision automatically creates possi- 
bilities and limitations. In the infinite scheme of what can be 
taught, teachers choose for their particular classroom communi- 
ties. In the same way, they can make decisions about portfolios 
with themselves and their students in mind." 

Mumme (19901 argues that a portfolio— as a tool for assess- 
ment — focuses on the student's productive work. It measures what 
the students can do rather than what they cannot do — one, of 
course, of our primary principles. But there are many other instru- 
ments that do this. 

Portfoli()s provide insight into many aspect:; of student 
growth— in mathematical thinking, understanding, ability to ex- 
press ideas, attitudes toward problem solving, and others. A port- 
folio, when used for assessment, is more than a "folder, " according 
to Mumme. It is a deliberate collection of student work that can be 
used to provide evidence of understanding and accomplishment in 
mathematics. A portfolio offers the potential of providing more- 
authentic information than other formats about a student's math- 
ematical endeavors. Such information can help students assess 
their mathematical progress, assist teachers in making instruc- 
tional decisions, improve communication with parents, and enable 
educators to assess the mathematics program at their school. 

To support her plea for portfolio assessment, Mumme (.luotes 
Everybody Count'i (MSEB, 19S9): "We must ensure that tests 
measure what is of value, not just what is easy to test. If we want 
students to investigate, explore, and discover, assessment must not 
measure just mimicry mathematics. By confusing means and ends, 
by making testing more important than learning, present practice 
holds today's students hostage to yesterday's mistakes." 

Does a portfolio really make manifest the philosophy im- 
plicit in the MSEB's position- To answer this t]uestion, it is 
necessary to consider the general attributes of portfolio assess- 
ment, described earlier, in light or the following examples ol 
student mathematics activities that can be included in a portlolio 
(listed in Mumme's recent monograph. Portfolio Assessment in 
Matlrematics [ 1990|); 



ERIC 



1 62 ❖ Di uwci 



student written work 

individual and group work 

rough drafts and finished products 

student writing 

projects and investigation 

diagrams, graphs, charts 

work written in the student's first language 

photos of student's work 

audiotapes of student explanations or oral presentation 
videotapes 

computer printouts and disks 

work dealing with the same mathematical ideas sampled at 
different times 

According to Mumme, teachers like portfolios for discuvssion 
between teacher and student, for meetings between teachers, for 
discussion between students, and for presentation to the school 
and community. The question remains. Are portfolios the best tool 
for this purpose? As one who has seen alternative forms of assess- 
ment functioning in different places on a larger scale, I would like 
to pose some questions about the effectiveness and validity of 
portfolios — interpreted in the broadest way. The inipression I have 
from, for example, frequent visits to the United States is that 
portfolio use is gaining momentum because current assessment 
procedures are not effective measures of teaching and learning. 
Portfolios offer freedom, are fun, are not structured, are open; and 
they represent a let of work for teachers— exactly the opposite of 
established assessment practices. But many teachers are unaware 
of the variety of assessment formats available. And many of those 
formats do the same things that portfolios are supposed to do for 
students, teachers, and administrators in a much more reliable and 
direct way. 

Hcilcinccd Assessment Pdckj^e 

Overall, according to Burkhardt and Resnick (1991 ), for teachers to 
judge student performance and growth, a balanced assessment 
package is needed. Their claim is that such a package represents the 
range of mathematics that we now aim for students to be able to do, 
as articulated in the NCTM Slamhirds [19^9]. The package these 
authors propose, based on a sequence of mathematics activities 
lasting a little over four weeks, is well worth careful study in light 
of this review of assessment instruments. Those who examine this 
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package will decide for themselves how well balanced it is. But it 
seems certainly more balanced than is usual in assessment. And it 
poses the possibility that even more balanced assessment packages 
can be created. The question thus becomes, Who decides what 
constitutes appropriate balance? We have described numerous 
formats that attempt to operationalize objectives for specific levels 
and outcomes. But deciding on the makeup of an ideal package 
depends not only on research outcomes that are currently lacking, 
but also on a philosophy of mathematics and mathematics educa- 
tion that is in the process of evolving. 

Burkhardt and Resnick claim that their package aligns with 
the Standards (NCTM, 1989). Mumme (1990) claims that portfo- 
lios offer opportunities to operationalize the philosophy of the 
Standards. The states of California and Connecticut, as well as 
several others, claim that they offer innovative forms of assess- 
ment that fit the Standards. We believe that much of the work we 
have done in The Netherlands on assessment tasks is consistent 
with the principles of the Standards. It may be that many others 
working in the field of assessment will make the same claim, and 
we have to be concerned that the efforts of some will converge on 
the practical tasks, which, we already know from our research, 
seldom measure middle and higher learning goals. 



SOME FINAL NOTES ON CHANCE 
AND ASSESSMENT 

Po/'nTs That Require Attention 

Designing tests and administering them in ordinary classroom 
practice is not a simple task, as we have seen. During the design of 
assessment tasks, one has to be very clear which goals are being 
operationalized, which context to choose or formats to consider, 
and the practicality of presenting it in the classroom. But other 
points also need serious consideration when choosing a balanced 
package of assessment tools, as the following questions indicate: 

■ Is the test to be taken within a fixed time interval (re- 
stricted-time test)? 

■ Is the test to be taken individually or in groups? 

■ Is the test to be taken at home or at school? 

II Is the test a single-strand test, an integrated test, or an 
interdisciplinary test? 
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■ Is the test part of a continuous assessment practice, or is it 
part of a more disc'ete scenario? 

■ And for some people the most important question, How 
objectively scorable is the test and what scoring tools do we 
have to assure that the scoring will be as accurate and fair 
as possible? 

These questions are deserving of a separate article. The following 
reflections are based on our experience during a decade of experi- 
ments on assessment. 

The Timed TeM C.'an Hdve M.iny Vciridnls. We usually identify 
the timed test with the individual restricted-time written test at 
school. This format forces students to pci form at the same sitting 
under the same external conditions within the same time frame. 
Research (de Lange, 1 987), as well as informal teacher experiences, 
seems to indicate that girls perform less well than boys under these 
conditions. 

But timed tests are also tests where the students are allowed 
to work in groups or at home if we pose strict time limits on their 
task. In these relatively open situations, it is more difficult to 
decide how much time we should allow students. The greater 
freedom provided by these options offers more time for reflection 
and creativity. 

Restricted-time written tests have certain advantages, like 
practicality, and disadvantages, such as the need on the part of the 
student for peak performance at a time decided by the teacher and 
under pressure. But the reduction of time restrictions in tests has 
disadvantages too, especially if the consequence is that the stu- 
dents can take their tests home. One option for carrying out 
assessment tasks in an almost unrestricted-time format is to do it 
at school: Have the students do an extensive performance task from 
9 to 2 r.M. or so. Of course, practically, this poses real problems, 
given the standard school day in most schools, but it we take into 
consideration the fact that this arrangement ij. necessary only once 
or twice per school year, it may improve the picture considerably. 

C erli^in TasAs Aie /Wore Soiled tor (jroiip Work Ihon Olliers. It 
may be enlightening to give a multiple-choice task to a couple ol 
students and ask them to reason aloud. Of course, multiple-choice 
tasks are seldom suited for group work, but as a learning experience 
they can be very rewarding, Not infrequently, the discussion takes 
a course like the following; "This is certainly wrong, this too. So 
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there are two possibilities. What does the designer of the tasks want 
us to answer?'" As a Dutch teacher once commented, "The students 
are secondhand thinkers. They are not accomplishing a mathemati- 
cal activity but merely reflecting on past experiences with multiple- 
choice and tr>dng to follow the thinking process of the designer." 
Another teacher expressed explicitly what many people know but 
some test designers deny: "Teachers really teach how to pass a 
multiple-choice test. A large part of the year is spent in training 
students for the test instead of teaching them mathematics." 

Group work is not very compatible with multiple-choice, but 
offers good possibilities in other formats. A group can be just two 
students, som.etimes three or four, or even the whole class. Two 
students can work very successfully on extended-response ques- 
tions, essays, two-stage tasks, production tasks, and in a limited 
way, on project work. Forprojects, it isusually necessary for several 
persons to cooperate successfully, but individual projects are fea- 
sible too. The fragmented information reasoning task has to be 
done by several students working together hccau.se the discussion 
about the hypothesis is the kernel of the whole activity. There is no 
way for the students to avoid this stage. 

The advantages of group work in assessment are well known: 
reflection on one's own thinking, reasoning and reflection, com- 
munication, production, cooperation, arguing, negotiating. The 
disadvantages are also clear: evaluating individual contributions 
fairly (if such are made), the practicality problem, the makeup of 
the groups, and the scoring. Cohen ( 1 9861 argues that evaluation is 
not so difficult as it may seem: If the students are at least fourth 
graders many important questions can be answered via a question- 
naire. But the questionnaire that she refers to focuses on how the 
students evaluate the group process, not on the mathematical 
quality of the result. 

Another factor to be considered in group work is the prospect 
it offers for systematic interactive scoring by an outside observer. 
Cohen argues that it is relatively easy to obtain a rough estimate of 
the rates of participation of the students. This may not be true at 
all: A student who is thinking on his own may, to an outside 
observer, seem to contribute little, yet still come up with the 
luvning point in the discussion, and this point may not always be 
easily rect^gnized. Besides this complication, we have to bear in 
mind that much group woik in assessment has to take place at 
ln)me or i)utside of school. 

We have observed successful group work in assessment m 
different ways: strongly organized groups with a presider, secretary, 
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and workhorses of different kinds. The students were evaluated on 
the basis of two factors: how they functioned in the group (the 
assessment Cohen is talking about) and their individual math- 
ematical contributions separately. Another option is to allow the 
group process to work by itself. The group as a whole is responsible 
for the product and all of the individual members of the group get 
the same score, irrespective of their contributions. The underlying 
assumption is that the harder working or smarter students will not 
let the others coast along for a free ride and that everyone has to 
make a contribution. This seems to work especially well in groups 
of two students. It seems important to note at this point that 
because of the assumption that we are working with a balanced 
package, we have no problem with part of the final score for a 
student being gathered in individual tasks and part in group work. 

Group work can take place within the classroom or school, 
but also at home. This holds for other tasks as well. The two-stage 
task was especially designed with the idea of combining restricted- 
time school-based tasks with a more unrestricted part to be done at 
home. Although the task was difficult to design, the results were 
very promising, as we indicated earlier in this chapter. Because the 
task remained unchanged, the components to be included were 
clear — only the conditions were different. On the one hand, we had 
limited time, a school context, a pressure factor, and equal condi- 
tions for the students (at least in the externals); and on the other, 
uni imited time, minimum pressure, and access to additional sources 
both in materials and in persons. The results for both students and 
teachers were satisfying because of the two grades and the insight 
it seemed to offer on the different qualities of the students on the 
same mathematical content at different levels. The fact that the 
correlation between the results of the two phases was low (< .SOI 
makes it necessary to consider the “take home part" of a balanced 
package seriously. 

Test BrccKith. Another delicate subject that needs more atten- 
tion than it usually gets is the problem of the “broadness" of a test. 
All too often, the students have to prepare for an algebra test, or a 
geometry test. Sometimes they may prepare for an algebra or 
geometry test, but in most cases they deal with an assortment of 
algebra or of geometry items and are not challenged to integrate 
them. It seems important to invest some effort in developing 
integrated tasks that draw on all of the domains students arc 
supposed to be familiar with or know how to handle at a particular 
moment. This is not a simple challenge, especially if we continue 
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to label the different aspects of mathematics as we usually do and 
offer students compartmentalized mathematics. 

Another area that needs further study, experimentation, and 
careful evaluation is test preparation. ''To prepare for a test" is a 
well-known and often well-defined activity for students. And in 
the standardized tes: arena, there is even a multimillion dollar 
industry that claims credit for doing this successfully. This, of 
course, is unfortunate for education and for politics. Assessment is 
a part of the teaching-naming process; it will be interesting to see 
how well the principles and goals for mathematics assessment, as 
published in For Good Measure (MSEB, 1991), will be implemented 
in the year 2000. The first principle articulated coincides with our 
own first principle: "The primary purpose of assessment is to 
improve learning and teaching. " And to make things a little c learer: 
"Whether with classroom assessment or external assessment, the 
process and result of assessment must inform and enhance the 
learning and teaching process rather than narrow or restrict it." If 
this ideal is to be realized, we also have to make decisions about the 
continuity of the proceSvS of assessment. Do we really want our 
students to "prepare for a test," or should a student, in theory, be 
ready for a test at any time? To select the assessment tools for a 
balanced package includes the spread in time of the use of these 
tools. Do we do all assessment during a single week in a semester 
or spread it out evenly over time, and v/hat are the different 
advantages of one approach or the other? In our opinion, assess- 
ment should be a more or less continuous activity — like the 
teaching-learning process — for improving that process in an opti- 
mal way. 

Finally, there is the matter of objective scoring. Our extensive 
classroom work, research (de Lange, 19871, and commonsense 
thinking make the following hypothesis look very easy to defend: 
The gains we make by obtaining a more or less complete measure 
of overall knowledge and capabilities by using a balanced package 
of assessment will by far outweigh the disadvantage that we have 
by "losing" a complete objective score. Intersuhjective scoring and 
proper scoring instructions give enough guarantees for a fair mea- 
sure, fair to the student and fair to the curriculum. Admittedly, we 
need more information and further trial studies and research. Rut 
the development of new assessment tools and guidelines on how t(^ 
use them and score them is essential. Successful reform in math- 
ematics education requires that curriculum, its philosophy, meth- 
ods of instruction, teacher training and enhancement, and assess- 
ment be revitalized in tandem. The purpose of this chapter is to 
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anticipate the problems and aspects of the assessment that we must 
address in the near future. 

Asiicssment: No Clwnge Without Problems 

A number of national documents, the changes that are taking place 
in mathematics education, and our decade of research on the 
teaching and learning of school mathematics all point to the 
necessity of primary changes in the way student knowledge of 
mathematics is assessed. Assessment should utilize real problems, 
which often will mean real-world problems and their applications, 
as well as the complexities that result when the world is brought 
into the classroom. Having made this point, it is immediately 
necessary to look at a number of other issues that we have to 
confront. These include the following; 

■ Teachers, test designers, parents, administrators, public 
officials, and citizens need to develop a new attitude 
toward assessment. This point is often underestimated. 
We cannot simply rely on assessment summits, reform 
publications, and other compelling developments in as- 
sessment. Society has a certain image of assessment that 
will take years to change and improve. The damage done by 
past assessment practices, especially in the United States, 
will take at least a decade to repair, and no quick or cheap 
solutions are available. 

■ Different levels of mathematical activities need different 
assessment tools, which are demanding and time consum- 
ing to design and require a major research and testing effort. 

■ To design a balanced assessment package will be difficult. 

■ To interpret the different strategies and processes that the 
students will come up with in more open assessment will 
he hard for teachers. Teacher training with special empha- 
sis on assessment is not only needed, but will enrich 
teachers' understanding of the problems we are dealing 
with. 

■ Different problems need different contexts, which take 
into account all kinds of variables, a point made earlier. A 
special problem will be to find the balance between a gcxKl 
context and a good mathematical problem. 

■ Scoring and judging the quality of the diverse assessment 
formats avadable to us will be more complex and varied 
and will be considered more difficult than scoring and 
judging test items is at present. 
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II these issues come as a surprise to a certain degree, then it 
only makes clearer the magnitude of the challenges we face; it 
defines the seriousness of the situation in assessment at the present 
lime. Assessment has become separated from its major partici- 
pants: the students, their teachers, and the mathematics curricu- 
lum. There can indeed be no change without problems. 
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5 ❖ The invalidity of Standardized Testing for 
Measuring Mathematics Achievement 

Robert E. SEike 



Matht'matics teachers across the United States report strong and 
still growing emphasis on standardized achievement testing in 
their schools.' They recognize a need to distinguish between valid 
and invalid uses of testing. The purpose of this chapter is to help 
them make that distinction. 

Testing as an activity and individual tests as a tool are neither 
valid nor invalid until the results are interpreted in someway. Only 
the interpretation of test scores in particular situations can he said 
to be valia or invalid [American Psychological Association, 198S; 
Cronbach, 1980; Jaeger ^Tittle, 1980; Linn, 1989; Messick, 19891. 

Standardized mathematics tests are used in many situations 
to obtain valid indication of which students are better at solving the 
types of problems included in the test. Student motivation for 
scoring well (Raven, 1992), time limits of the test (Shohamy, 19841, 
unfamiliarity of the language and format of the test, and other 
features of test production and administration (Traxler, 1951; 
Airasian Madaus, 1983; Freeman et ah, 1983) can contribute to 
invalidity in interpretation. Rut good achievement tests, properly 
administered, with scores cautiously interpreted, are an appropri- 
ate component of mathematics education. In our schools, it is a 
responsibility of teachers to recognize and reward superior perfor- 
mance. Test scores contribute to teacher awareness of how stu- 
dents compare with one another. 

vStandardized mathematics test scores are not, however, a 
sound basis for indicating how well students are becoming edu- 
cated in mathematics. Scores that kiO a good job ot indicating which 
students are doing best and wh ich are doing relatively poorly do not 
necessarily provide a valid indication of subject-matter mastery. 
One test alone w'ill not provide valid measurement ot the math- 
ematics achievement of individual students or of a group as a 
whole. Test content almost alw'ays is too narrow. Just as a few 
students do not represent all the students in a scho(d and a few 
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books do not represent all the books in a library, twenty or thirty 
test items do not represent the broad range of mathematics skills 
and knowledge that teachers are teaching. For measurement of 
subject matter attained, the simplicity of testing is at odds with the 
complexity of teaching and learning. 

When we talk about mathematics education, we rc' ^'r to long- 
standing and well-deliberated definitions, though they are not 
often put into words. With good reason, we are reluctant to change 
our notions of education to fit the simpler definitions offered by 
standardized testing. For example, part of the established meaning 
among educators and others is that education is a personal process 
and a personally unique accomplishment. For each student, expe- 
rience in and out of school is different; thus, the formal and 
informal meanings of arithmetic, algebra, geometry, and all of 
mathematics are different from student to student. Furthermore, 
each mathematics teacher's understanding is personally constructed 
and somew'hat tuned to the cultural experience of teachers. The 
ideas they share in the classroom and stim.ulate in the minds of 
students differ from teacher to teacher. Of course we speak of 
standard courses of study, common objectives, and shared under- 
standing of mathematics — but the education of youth includes the 
application of mathematics to many unique experiences of past 
and future. Part of the invalidity of achievement testing is due to 
constraints imposed by common aims. But there are other con- 
straints as well. 

Much of mathematics education is beyond accurate assess- 
ment. That does not, of course, relieve the teacher of responsibility 
for searching for valid evidence that education is happening. Much 
good evidence of mathematics achievement comes from reflective 
interaction with individual students and within the class as a 
whole — interaction during recitation, exercises, projects, and test- 
ing. Test scores alone are a flimsy indicator of the mathematics that 
Students have learned. 

In this chapter, I will describe the mismatch between the 
highly abstract but simple constructs of mathematics held by 
developers of standardized tests and the situational and much more 
compounded conceptualization of mathematics held by teachers — 
a difference that extends, of course, to interpretations of math- 
ematics achievement. This is a problem common to other subject 
matters but only mathematics education will be considered here 

The teacher-readcrwill be reminded that standardized testing 
serves certain school administration and classroom management 
purposes. The appropriateness of such use is not Cvinsidered di- 
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rectly to be an issue of validity, yet it belongs in the discussion. 
Because validity is rooted in the interpretations teachers and others 
make of test scores, I think it important to show how testing as a 
process, how assessment as an instrument of educational reform, 
is seen by teachers to contribute to or detract from school improve- 
ment. The more they acquiesce in the idea that good test scores 
mean good teaching, the more readily they will regard testing as 
related to good changes in school programming and the more likely 
they wi 1 1 interpret tests as defining the mathematics to be ach ieved. 

For any particular use of a standardized achievement test, 
validity of the measurement indicates the quality of information 
conveyed about how well the students are achieving — for lifetime 
so far, for the year, or even for the lesson. The key distinction in this 
chapter is between a generic, single dimension concept of achieve- 
ment — a view promoted by test specialists — and a complex, expe- 
riential conceptualization of achievement detailingthe many steps, 
the many differentiations, of content and skill — a view held by 
teachers. My conclusion will be that these views are so different 
that the panorama of achievement that mathematics teachers 
regularly scan cannot be measured validly with the standardized 
achievement tests in use today. 



MATHEMATICS EDUCATION 

It seems we all know what school mathematics is. For many people, 
such common experience seems to need no definition. But that 
presumption, held too by more than a few mathematics teachers, is 
a misperception. Mathematics educator Thomas Romberg has 
noted, '' Mathematics is viewed as a vast collection of vaguely related 
concepts and skills to be mastered in strict order, with the sole 
objective of becoming competent at carrying out some algorithmic 
procedure in order to produce correct answers on sets of stereotyped 
exercises'' (1987). Mathematics indeed embraces a vast array of 
concepts and operations, but in the teaching of mathematics, stu- 
dent understanding is more the objective than development ot 
powers of calculation.. In point of fact, the detail and the scope ot 
mathematics education exceed the best of definitions. 

The Undcrpvri option ot Miithcmiitic s Ti\ichin^ <md Li\nnin^ 

Neither good nor bad teachers stick to the point. In the classroom, 
good teachers roam the content terrain, point out and extend major 
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connections, introduce concrete situations of relevance. As they 
teach them, mathematics knowledge and skill are not collections 
of discrete elements. Each problem type and algorithm is linked 
into various networks of knowledge, diverse traits, other systems 
of thinking ‘SchefHer, 197S; Romberg and Carpenter, 19861. One- 
digit addition and two-digit addition are closely linked; one-digit 
addition and two-digit multiplication are less close, yet linked in 
several ways. These several ways become multitudinous when 
applications are acknowledged. The applications of mathematics 
quickly become too numerous to itemize in tables of contents, lists 
of obiectives, lesson plans— -yet the teacher, not only deliberately 
but subtly and unconsciously, adds dimensions of meaning to each 
operation and concept. 

The chapter titles of a mathematics textbook seem simple 
enough. For example, the chapters of the book used by the upper 
sixth grade in the Duxbury (Massachusetts! Intermediate School in 
1989 are listed in table S. 1 . A quick review of chapter 1 i/'Addition 
and Subtraction of Whole Numbers"! finds further subdivision into 
the topics of place value, reading and writing groups of three 
numerals, one- and two-digit addition and subtraction, properties 
of addition, three- and four-digit addition, money units, missing 
numbers, five-digit addition, three-digit to six-digit subtraction; 
subtraction with zeros, comparing numbers, greatest and least 
numbers, rounding numbers, estimation of sums and differences, 
Roman numerals — with some special attention to consumer skills, 
career interests, and problem solving. And each of these topics 
could be further subdivided. The inventory ot topics spanning all 
fifteen chapters is extensive. 

TABLE .^.1. CIIAPTERTITLES cn- A MIDl^LL SCH(X:)L 
.MArHEMATICS TEXTHCX^K 
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Teachers classify mathematics just as the textbook authors 
did, into various domains, sometimes for teaching and testing 
(Collis, 1982; Collis Watson, 1989). ’Their formal classification 
headings are useful as conceptual structure hut draw too much 
attention to the well-known main topics of mathematics educa- 
tion. Educational researcher Lauren Resnick has observed, ''The 
range of mathematical concepts to be learned [is] much broader 
and only a few of these crafts have been intensively studied." 
(1989, p. 164). Furthermore, learnings within a class or even 
within a small subclass are related to each other in many moie 
ways than indicated by any single classification scheme. And 
many of the tasks and concepts within a subclass have important 
uniquenesses. To show this, we can examine the seven items of 
figure 5.1, seven problems not unlikely to appear in a 40-minute 
activity in algebra class. 



I. 
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Figure ^.1 . A tnmily of seven in;itliematics iiem^. 



These seven items cut across several content domains (Hively 
ct al., yet a teacher might include all of them within a single 

lesson, within a single objective, or reier the soluti(m oi each oi 
them, to a single page of the textbook. Each item is unique, a special 
variation on the others. Each will be more or less well understood 
by students and thus more or less dilficult. Still, dilterent in form 
and notwithstanding translormations, they belong to a lamily. The 
family here is not defined by mathematical operations as much as 
by the practical problem of dealing with two temperature scales, 
Celsius and Pahrenheit. 
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Inventories 

In the backs of our heads, we all have epistemological inventories of 
mathematics education that extend beyond the families, classes, 
levels, and lattices of quantification. These inventories are particu- 
larly broad when we include the applications of mathematics. The 
inventories can be organized around any one of the many conceptual 
stmetures proposed, such as that by Edward Haertcl and David Wiley 
(1990; see also Henderson, 1963), Such classification schemes gravi- 
tate toward a powerful simple structure. Few of them reflect the real 
complexity of mathematics education to be found at every grade level , 
Inventories of teaching and learning, were they actually recorded, 
would show that complexity. They would reveal each teacher's 
complex conception of the nature of mathematics education. 

For our inquiry^ into test validity, we need content inventories, 
not just categories of topics and principal kinds of mathematical 
activity, but inventories that reflect existing definitions of education 
and the complexity of teaching and learning in existing classrooms. 
Even people who know little mathematics can identify numerous 
categories. But teachers gomuch further, particularly in their choices 
of what and how to teach. Their conceptualization of the content of 
mathematics education is vast and detailed,^ They decide, for ex- 
ample, whether to treat vertical addition the same way they treat 
horizontal addition. Most treat subtraction with borrowingas differ- 
ent from subtraction without borrowing. Some would treat multipli- 
cation of decimal numbers with zeros immediately following the 
decimal point as a special learning. Mathematics educators have 
been diligent in classifying such operaticvis, but teaching practice 
regularly creates a host of additional subclassifications. 



1 . Three chiKlrcn wi.sh to divide two oranges evenly among 
ihcimdvcs. Carefully peeled, one orange found to 
have 12 sections, the otner 13. Whai should lh.iy do? 

2. i hrec children wish to divide two oranges evenly amung 
themselves. Unpeelcd, one orange weighs 8 1/8 ounc' 
:uid ha.s 12 sectioas. The other weighs 7 3/4 ounces and 
ha*. 13 stxiitirLS What should they do? 



Figiiic ' 2. Two problem st)lving cxcicisc^. 
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Consider the problem-solving exercises in figure 5.2. What if 
two sections of an orange are withered? And what if juice is lost by 
cutting? At what point does inequality not matter? We pause 
reflectively before placing these two exercises in the same category 
in our inventory. As teachers, we associate teaching strategy with 
learning tasks: Should we schedule peer group dialogues (Easley 
Easley, 1992) and cooperative learning (Johnson Johnson, 1991)? 
Is It the right time to refer to spherical geometr>’? In examining the 
invalidity of testing in this chapter, I will raise questions of logic, 
pedagogy, learning activity, difficulty, and utility. Already it should 
be apparent that it is not reasonable to suppose that any one task 
represen ts its category adequately. Can one suppose that a student's 
performance on one item will reveal how that same student would 
perform on another? 

The many transformations of a mathematics problem extend 
beyond mere restatement, examples of which are shown in figure 
5.1, into multidimensional extension. The transformation of tasks 
is further exemplified by the five questions in the previous para- 
graph. The various forms and language of a teacher presentation are 
part of what is learned by the child. Children have a considerable 
capacity for recognizing item type and transformation. Capacity 
grows as experience grows. The teacher contributes not only by 
drawing attention to performance tasks, but by engaging students 
in expository discourse about both large and small tranformations.^ 

In the mathematics classroom, such transformations arise, 
over and over, minute by minute. Some arc simple, some are 
complex (Scheffler, 1973; Romberg, 1987). Though many transfor- 
mations are deliberate, mathematics teaching takes the envelope 
of transformation largely for granted. So do students, parents, 
administrators, and policy makers. All find simple ways of repre- 
senting inventories of operations and tasks. The labels people use 
for identifying the domains or topics or families of mathematics 
items suggest a homogeneity and generality that lead us to summa- 
rize performance by a single test score. Test scores seriously 
understate the diversity and complexity of teaching and learning. 
Mathematics eduration, then, appears to be more coherent and 
simply structured than it is. 

The Artific m/t/v' ot Ac hicvcnn'nt as j Construct' 

It IS not artificial for a closely observing teacher to describe how 
well a student has worked a mathematics exercise or project. It is 
not artificial to indicate how many problems or test items were 
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an? vvcrcd correctly. It is not artificial to conclude that a student has 
acHcvcd a level of mastery, at least for the time being, over a 
pai licular group of exercises. But, to generalize broadly about 
achievement is artificial. It is common for people to treat math* 
ematics achievement as real‘'--but risky. It is artificial and risky to 
conclude that a student has achieved proficiency over a type of 
exercise such as addition of fractions or the binomial tbeorem. It is 
artificial and risky to allude to achievement of a content so vague 
as sixth grade mathematics. 

In speaking of mathematics achievement, one alludes to a 
selection of mathematics to be taught or that has been learned. As 
indicated in the previous section, 1 choose to call such a selection 
an inventory.^ An infinite number of mathematical tasks exists. 
For any one moment, just what mathematics are we talking aboutt 
The selection is identified by the inventory. The inventory will not 
include ''all of mathematics," much of which even the brightest 
mathematician does not know. We are thinking of all of the 
mathematics of interest just now. It could be all the mathematics 
taught in this school in sixth grade or all the mathematics needed 
fora student to be reasonably qualified when entering an engineer* 
ing program at the state university. Inventories need not be a 
consciously itemized but must have substance, structure, bound- 
aries, and lots ot detail. 

If the concept of mathematics achievement is to be useful, the 
inventory of the mathematics potentially achievable needs to be to 
some extent delimited and realized. Does the inventory include 
multiplication of fractions, the simple uses of a hand calculator, 
applications of the Pythagorean theorem, a notion of the work of 
Hertvanu Russell, orienteering? Specification ot the contents ot the 
inventory need not be formal; it can be expenential and intuited — 
th(High, if there is to be meaningful dialogue about mathematics 
achievement, there should be some shared meaning of the inventory. 

Each inventory may relate to goal statements, textbook exer- 
cises, and item pools. Subsidiary domains and boundaries are 
inevitably inexact. Each iormulation ot mathematic:; learning, 
each formalized inventory, is an umbrella for a vast array of tasks, 
habits, skills, and knowledge. Such formulations are large under- 
statements of the mathematics intuitively and properly included 
in the practicing inventories of teachers. 

It there were strong interdependencies among the manv 
domains of mathematics, the need tor an elaboiatc inventory 
would diminish. Were advanced skills simply composed ot "pre- 
requisite" skills, as claimed by Chigne (1S)67), the inventory could 
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be easily specified. Were knowledge of calculus derivable from 
knowledge of trigonometry; we could use the latter to indicate the 
former. How much of a definition of education can he derived? 
When we have a well-developed set of relationships about the 
construction of an entity and can directly measure some of the 
parts, we can calculate other characteristics. The derived measures 
are not artificial. 

But, in spite of its reputation as a logically coherent aggregate 
of knowledge, in spite of the common view that advanced math- 
ematics skills are determined by prerequisite skills, we have no set 
of formal relationships that hind together the many domains of 
mathematics and mathematics achievement. Not only does under- 
standing simultaneous equations remain largely independent of 
understanding permutations, but even the knowledge of fractions 
and the knowledge of decimals remain largely independent — a fact 
that leads me to presume that no universal system is possible. For 
now, at least, the best epistemological relationships we have are 
few, partial, and hypothetical. It is obvious, for example, that 
successful long division requires some competence in subtraction, 
hut mastery of many subtraction problems cannot he assumed, 
given mastery of the main types of long division. The field of 
mathematics is too complex and dissociated to permit interpreta- 
tion of one aspect of mathematics achievement on the basis ot 
knowledge of another aspect. 

7/k‘ Dissoci^ition of t\UUhcm,Uics 

A person has manv mathe:natics knowledges and skills — each 
heingsimultancously usedan Iforgottcn, forever incomplete. Many 
knowledges and skills are re.ated hut few are highly interdepen- 
dent, their dependence compli cated by their incompleteness. When 
mathematics is considered in the broadest sense, a surface of 
personal achievement stands near zero in many places and rises 
irregularly and not very predictably in others.^ 

As mentioned before, \v*e seldom are thinking oi all math- 
ematics. We usually limit our thoughts of mathematics achieve- 
ment to those things covered by certain goals or particular chapters 
or teaching of a specific grade in a specific school, a terrain much 
more circumscribed. In spite of its reputation as highly inte- 
grated— that is, as a suceession ol prerequisite learnings--the 
topic' ot mathematics are quite dissociated. The same is true ol 
mathematics education. Generalizing from one aspect ot achieve- 
ment to another is problematic. For example, though many people 
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comprehend both, understanding symmetry remains independent 
of understanding orthogonality. Often our concern is with the 
content of mathematics and not with some generalized notion of 
mathematics ability. The content associated with a goal or con- 
tained within a lesson is often too heterogeneous for a few items to 
represent the rest. 

Often our representations, our talk, does not require preci- 
sion. There are times when all we want is a rough indication of how 
much a youngster has achieved. A teacher's recitation questions, 
chapter tests, and midterm grades, for example, serve as rough 
indicators of achievement. These approximations are bolstered by 
teacher knowledge of what has happened in the classroom. Achieve- 
ment is assessed with reference to an inventor>^ (For outsiders, and 
even for the students, much of the teacher's inventory of the 
mathematics covered remains vague.) Among experienced teach- 
ers especially, the mathematics content is shared through custom 
and conversation. Still, precision of assessment is out of reach.'^ 
Each teacher's inventory is different. It is important to recognize 
that as used by even the most knowledgeable teachers and testing 
specialists, the concept of mathematics achievement is artificial 
and imprecise, suitable at times foi casual reference but a ques- 
tionable basis for indicating how much mathematics a student 
knows. 

The concept of mathematics alrlity, sometimes deduced 
from performance on achievement tests, suffers from the same lack 
of common inventory of the mathematics covered. Ignoring con- 
tent, the construct, mathematics ability, is useful as an indica- 
tion — relative to other learners — of how much learning time or 
teaching effort will be required in subsequent courses. Relative 
standings remain quite stable for a fixed group of students, stable 
even as individuals pass on into other comparable groups. 

Notwithstanding useful predictions, the way the concept of 
mathematics ability has been contrived has been iniurious. Many 
test specialists, especially those advocating an item-response- 
theory approach to assessment (Hambleton, 1989), hnee disparate 
aspects of mathematics into a single indicator, omitting from their 
derinitions those tasks and forms of knowledge that do not fit their 
scales nicely. The iniury occurs when teachers, failing to see 
certain topics included in the tests, drop those topics from the 
inventories to be taught. Topical items most useful for piedictiiig 
mathematics achievement are not necessarily good lor defining it. 
Mathematics ability and mathematics achievement are not inter- 
changeabl e concepts. 
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When it is necessary for us to estimate, to generalize over 
unknown terrain, to presume the nature of the whole, to work with 
entities the contents of which have not been specified, our mea- 
sures of the whole are artificiaL Except in the simplest situations, 
the formal measurement of mathematics achievement is artificial. 

Tccicher Conceptualization of Mathematics Education 

As every teacher knows, there are shortcomings in education in 
America — including teacher shortcomings. If ours were an excel - 
lent educational system, thousands of teachers presently teaching 
would not be teaching.*’ But the fact of the matter that the 
teachers we have arc one of the stronger assets of the system, much 
stronger, in my opinion, than the administrators, the textbooks, 
the willingness of students to be students,'- and the tests.'' All 
could be better, of course. And at times, each of us believes, 'Ml I 
were to yield to the pressures to change, conditions w'ould become 
even worse." We are more or less locked into a mediocre educa- 
tional system and appalled at the prospect of further deterioration. 
The tests that show our national achievement to be poor are not 
wrong but, as part of the problem, drive schooling toward standard- 
ized, authen 'cated mediocrity. 

Current teachers are an asset mainly because they have a 
long-developed and far-reaching conceptualization of the connec- 
tions'* of ideas and behaviors that constitute a certain high school 
course or the year long lessons for a particular elementary school 
grade (Lieberman 1984; Connell, 1985; Lampert, 1988). An experi- 
enced mathematics teacher has a strong idea of what topics should 
be covered, the calendar and lime allotments involved, the rela- 
tionship and interdependence of topics, the nuances and suhclassi- 
fications of topics, diverse applications of topics, the relevance of 
topics to standardized testing, opportunities for enrichment and 
cooperative learning, nurturing independent thinking and self- 
directed learning, ways of increasing motivation and decreasing 
discouragement, what the stumbling blocks will he, how socializa- 
tion and conflict preempt academics, what experience students 
bring to the classroom, the expectations of students, parents, and 
other teachers. The w'ork of teaching is complex. 

I want to emphasize again the complexity of mathematics 
education. Somewhere in the mind of each mathematics teacher an 
inventory of the topics to be taught exists. Each topic, alone, is as 
intricate as a tree, with large and small branches, with iwugs and 
lacy buds and leaves, individually dispensable but collectively 
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vitalizing the tree. The parts are comprehensive and capable of 
personal interpretation. Trunk and main limbs are represented by 
the classification of goals, objectives, chapters, and types of prob- 
lems — extremely important as structure and discipline for the 
emergent learnings of mathematics. But the connection of math- 
ematics to experience (playing store or calculating water needs for 
a camp-out) to other subject matter and preparation for livelihood 
depends on the teacher's acquaintance with limbs and foliage. 
More than anything else, I believe, it is the teacher's comprehen- 
sion of the subject matter that distinguishes between professional 
work and "functionarial" extension of the district curriculum 
office. 

The district curriculum office has a critical role to play and 
many mathematics coordinators play it well. Individual teacher 
conceptualizations need to be nourished and protected; and in the 
case of those conceptualizations of teachers that are misguided or 
error ridden, redirected. The district coordinator supports the 
strong teachers and looks for ways to make students le.ss dependent 
on the weak ones. The competent coordinator works to assist 
teachers in resisting the impairment of their conceptualizations by 
standardized testing. 

These conceptualizations of mathematics on the part of 
teachers and particularly the inventory of mathematics to be 
taught are the critical epistemology of education. Comprehensive- 
ness, integrity of content, and topical uniqueness are no longer 
certain to be found in the minds of superintendents, in the presen- 
tations of textbooks, or in the coverage of tests. No one is reading 
John Dewey. Few mathematics teachers know a mathematician 
with whom to discuss their field. Authority in subject matter is 
being preempted by the syllabi, the textbooks, and the tests — each 
with a leaning toward simplification, increasingly myopically bent 
on raising (potentially embarrassing) achievement scores. The 
situation is in II ix. Within the context of that still vital autonomy 
exercised in most classrooms, the complexity of teaching contin- 
ues. It draws upon that repository of mathematics education- 
content and technique — in the minds of the teachers, an impover- 
ished cache, but a precious asset. 

Ji\}( Assessments 

In this last decade ot the twentieth century, education remains 
labor intensive. Efforts to automate teaching have been largely 
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unsuccessful.''' Why do we continue to put at least one expensive 
laborer in each classroom of every school? It is not because teacher 
unions are featherbedding. It is not because it takes a scholar to 
maintain discipline. It is not primarily to assure the choice and 
presentation of subject matter. We place almost 3 million teachers 
in American classrooms because managing the conditions for 
learning requires constant attention — recognizing readiness to 
learn, the uniquenesses of students, obstacles and intrusions; 
perceiving ''progress within wrong answers," a never-ending as- 
sessment of student achievement.*’ 

Following the ordinary practices of schooling, assessment 
summaries by teachers are not used to inform school adminis- 
trators or the public of the scholastic integrity of the school nor 
to provide parents an understanding of career prospects for their 
children. Such reporting of educational progress is beyond the 
current skill of teachers, at present and in the future. As de- 
scribed repeatedly in the research literature (Gage, 1972; Giroux, 
1988; Lortie, 1975; Resnick, 1989; Rosenshine, 19701, teachers 
use their ineffable, informal assessments to direct the activities 
of learners, reallocating time on task, recognizing patterns of 
idiosyncratic thinking, modifying interpretations. It is a form of 
assessment based little on a science of education, attendant 
little to formal testing; it is rather an intuitive artistry that 
matures in the reflective experience'-'' of day-to-day teaching. 
Artistry rather than technology prevails because education, not 
mere training, is a highly individual experience. Many gover- 
nors, newspaper writers, and educators claim that more of 
teaching should be decided centrally, in advance, and standard- 
ized across classrooms.*'^ Even in the decentralized school, to a 
certain extent, major school goals arc prespecified and common. 
In all schools, teaching varies from room to room for good 
reason: Each school is different; each teacher is different; the 
children are different. We educators and researchers back away 
from the state of the art when we support blanket prescriptions 
for heterogeneous schools and youngsters. Our formal plans are 
embarrassingly simplistic when compared to the routine and 
intuitive conceptualizations held by teachers. Teacher-made 
assessments are essential. To organize each course of study to fit 
national specifications would be more than a revcdtition; it 
would overthrow all of the serious notions of education we 
currently hold. Teaching has developed as an art;’" as a technol- 
ogv, it is far less sophisticated.'- 
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Representing Education 

As all people do, teachers use simple representations. Their course 
outlines and lesson plans briefly list topics and activities. To satisfy 
the reouiT-pments of administrators or to talk to visiting parents, 
they sometimes refer to lists of objectives such as those articulated 
for the state of Georgia, abbreviated in table 5.2. But in thinking of 
how and what they will be teaching, teachers work at a much 
higher level of complexity. 



TABLE 5.2. GOALS FOR EDUCATION IN GEORGIA (Excerpted) 

The Georgia Riurd ot Education has adopted student goal statements that 
identity the ideal skills and attitudes a graduate ot Georgia's educational system 
should strive to achieve through instructional programs in the state puhhe 
schools. The State hoard helieve> that the instructional program in the public 
schools should provide each individual with oppoitunities to develop abilities so 
that he or she 
c o m m u n i c a t e s e 1 1 e c 1 1 \ ' e 1 >' 
uses essential mathematics skills 
recognizes the need tor lifelong learning 
has the background to begin caieer pursuits 
participates as a citizen in our democratic svstem 

[And tor mathematics in Georgia:) The mathematics section of the Quality 
Core CurviCLilum consists of objectives relating to concepts, process skills and 
problem solving at each grade level, kindergarten through eighth. In grades Q- 
12. obiectives are given for each mathematics course. Mathematic> began and 
continues to be a way of organizing (Mie's world, through the studv of quantity 
and space, their properties and the relationship|s] within and between these 
concepts. Mathematics is first experienced as a language created to describe the 
world, accompanied by rules that govern its use. 

|And f(u Algebra 1. the Topics/Concepts are uicntiHcd as ) E PoKnominals 
1.^ Identifies polynomial expressions 
1 4 Adds and subtracts polynomials 

D-. Uses ot laws exponents nectssary to perform polynomial operations 



Complexity of teacher thinking was illustrated earlier in 
figure .5. 1 by seven mathematics items. To a testing specialist, 
these items are points on a single scale; they measure essentially 
the same thing. To a teacher, each is unique. The statistical 
correlation among the seven would run high, hut each item re- 
quires its own understanding of terms and operations. Tlie math- 
emaiics teacher extends instruction to the details of each item. 
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Getting any six of the items right does not assure getting the 
seventh right. To a teacher, mathematics achievement is not a 
matter of getting the best score on the test, but of understanding 
and performing the work. 

Within mathematics education, far more interweaving and 
interdependence of meaning occur than is apparent on a list or 
content-behavior grid (J. Wilson, 1971, p. 6461. What if we tried to 
represent the similarity (proximity! and sequentiality (directions) 
of mathematics topics’ The hypothetical map in figure 5.3 might 
stimulate our thinking. Here, for example, the topics of trigonom- 
etry appear closer to geometry than to arithmetic. If we had more 
detail, we would expect to see percenla^c lying closer to fractions 
than to probability. Such a two-dimensional map raises many 
questions but turns out to be as unsatisfying as a list. The relation- 
ships overwhelm the mapping. 

Yet, when we analyze what a teacher is doing, we find topics 
and activities connected in logical ways as if all were mapped there 
in the teacher's mind. When we ask for a representation, the teacher 
scldomi produces a detailed guideline as to what teaching fits where. 
Indirectly more than directly, the teacher has transformed complex 
epistemological relationships into a course schedule and on-the-spot 
action and reaction. When we analyze the thrust, we find teaching 
not aimed at developing some general mathematical ability but at 
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developing a knowledge of specific topics and skills in solving 
specific kinds of problems. The inventory is the tacit map by which 
the pursuit of knowledge is rationalized. 

Mathematics teachers incorporate anticipated student be- 
havior into instruction. They allocate a large portion of time to 
operations and problem solving. Their conceptualization of math- 
ematics teaching is process oriented more than it is outcomes 
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oriented. The teachers strive for high quality experience, immer- 
sion in the topic, honing the particular operation. Few mathemat- 
ics teachers think first of making children ''numerate" or (unless 
harrassedl pumping for better scores on an achievement test. Their 
first aim is to help children gain command of a far-reaching, 
unspoken inventory of subject matter, outlined perhaps as the 
National Council of Teachers of Mathematics ( 1 9891 proposed hut 
extending to a network of detail as salient itself as the major 
classifications. 

How does a sixth grade teacher approach the lesson? Figure S.4 
is my impressionistic representation of choices made by a teacher as 
to what to teach on a certain day about the Pythagorean theorem. 
The topic is mentioned in the state list of learner objectives and 
identified in the district curriculum guide and the textbook the 
teacher is using. To a degree, the textbook author defines what will 
he taught hut, especially in recitation, the teacher modifies course 
coiueni to fit the situation, noting especially the frame of mind of the 
present student group. Reflecting on the many pertinent topics for 
the class, (topography A, bottom of the figurel, the teacher considers 
the facts, concepts, relationships, and applications of the Pythagorean 
theorem. Distribution B represents a closer look at what is most 
relevant for these particular sixth graders. The teacher draws ele- 
ments from several knowledge bases (circles C, D, and El to obtain 
a small selection to teach (plate FI, then thinks about learning 
difficulty (Cylinder Gland inserts it as content (tray HI for this class. 
The teacher anticipates a small presentation with graphics, reading, 
seat work, and homework. When he or she presents the Pythagorean 
theorem in class, ideas are modified as the conversations of instruc- 
tion occur, shaped, of course, by the teacher's overall concept- 
ualization of mathematics education. 

The naivete of figures S.3 and .^.4 is obvious; no less so that of 
the lists of tables S.l and .^.2. Graphic technology to represent 
pedagogy and epistemology is not highly developed.’* Classifica- 
tion systems and content-skill grids are common in curriculum 
otfices hut there are few devices to represent the conceptual links 
between topics and guide pedagogical moves from one content to 
another. Yet, jtist as ancient travelers reached destinations before 
I here were maps, teachers teach without maps, build without 
blueprint^.. Intuitively, good teachers merge tt'pieal paths, capital- 
ize on personal experience, and draw out and preserve the students' 
lines ot thotight. 

And let us project these graphic representations further to 
include the teacher's recollection of student achievement. We note 
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first that most su lents are remembered primarily as members of 
a class that experienced the scheduled instruction and engaged (to 
an intensity varying across and within students) in mathematical 
tasks. During the school term the class was exposed to the teacher's 
inventory of mathematics pertinent to that term. Both class and 
inventory are remembered as ordinary in certain ways and unique 
in certain ways. The teacher retains an awareness of the individual 
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student's aptitudes for mathematics, previous experience, and 
readiness to learn. When required to submit a grade for the term or 
to brief the students' next teacher about them, the teacher will 
report such general characteristics. But when assessing what the 
youngster has achieved and how his or her own teaching might be 
revised, the teacher invokes the inventory of topical experiences 
and tasks, recalling work completed, noting moments of insight 
and misunderstanding. My representation of the teacher's percep- 
tion of student achievement appears as figure 5.5. The emphasis 
there is on time and tasks, not on student abilities. 

On these pages so far, I have described the teaching and 
learning of mathematics as enormously detailed. It is apparent, on 
the basis of their words and activities in the classroom. *hat the 
conceptualizations of teachers as to what constitutes a course 
greatly influence their planning, instructional stratC'^y, and assess- 
ments. Although the writings of mathematics educators, school 
district syllabi, textbooks, and tests can be said to be built intellec- 
tually on a more powerful struc ture, these formal conceptualizations 
of mathematics education do not identify a great many of the 
characteristics of mathematics achievement important to teach- 
ers. lust what information is available from standardized math- 
ematics achievement tests will be dealt with next. 



TESTING AS INFORMATION GATHERING 

The public does not understand how there could be sincere objec- 
tion to using standardized achievement tests to represent what 
should be taught. People think the tests measure what students 
know and do not know. They correctly see the tests as nationally 
based and technically elegant. They presume that tests valid for one 
educational purpose are valid for others. As students for many years 
themselves, they have experienced teaching and testing; the ques- 
tion of alignment almost never came up. Today, when a mismatch 
is claimed, often they presume that the teachers are wrong. Most 
administrators, counselors, and board members do not question 
testing. Many interpret objection to testing as an unwillingness to 
acknmvlege shortcomings in the educational system. Dealing with 
ihesL problems requires a careful look at the slandardi/.ed achieve- 
ment test as an information-gathering instrument. 

Testing, broadly considered, is the presentation of certain 
challenges with responses judged as right or w.*ong. Achievement 
testing is an activity that generates student performances to be 
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interpreted as above or below pedagogical criteria. A test score 
indicates performance on a collection of items. Test makers con- 
ceptualize what constitutes a good performance. The score is taken 
as an index of achievement, a datum, a bit of information. For most 
people, testing is seen mainly as an information-gathering activity. 
After the information is put to use, we can speak of the validity of 
the interpretations. 

Information from testing can be treated both as measurement 
data and as data for pondering a problem (Lorge, 19S1 ). When we 
have measurement data, we think as if we have captured a dimen- 
sion of something real, something substantial, such as the mea- 
surement of age or hat size. For people having measurement 
appetites, and that includes most of us, standardized tests are 
expected to provide a trustworthy indicator of student achieve- 
ment (Cronbach ik Meehl, 1955; Shavelson et al., 1987). Even if the 
scores are not entirely accurate, there is the expectation that the 
amount of something real is being expressed. 

Even when we do not know the validity of the measurements, 
ti ic information from testing can be problem-pondering data, thought 
provoking, the basis for hunches, actually helping to shape strategy 
iK'cause strategy is based partly on subjective judgment. These are 
formative data, potentially useful for redeveloping an idea or 
practice. It is important fora teacher to consider both possibilities 
when reviewing test results. Much of achievement testing will fail 
to provide teachers with dependable measurements, yet be useful 
for tactical review and reconceptualization; it will provide far more 
than a guess, far less than a causal link. 

Testing is more than information gathering. It is a manage- 
ment control mechanism as well. It is used to announce purpose 
and priority. It is scheduled in advance by administrators so that 
effort is shifted toward particular goals. It is an instrument of 
reward and punishment. Testing and other forms of assessment are 
widely seen as essential to accountability and educational reform 
iHouse, 1978). The effectiveness of the testing process is often seen 
in terms of the contribution it makes to the management ot 
classroom, school, and school system (Stake, 1991). 

In the third part of this chapter, I will present data on the 
promise, use, and effect sot testing — as viewed by a national sample 
of mathematics teachers. My main interest in the survey was to see 
what changes, it any, in schoolwork, particularly in cuiTieulum, 
the teachers attributed to the national emphasis on testing. In the 
next section, 1 will discuss the cennmon expectation of lesiingas an 
inlormat ion-gathering activity. 
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GENERALIZABILITY OF MATHEMATICS 
KNOWLEDGE FROM STANDARDIZED TESTS 

It is widely supposed that a good mathematics test will indicate 
the amount of student knowledge of the mathematics broadly 
represented by the test items (Floden et ah, 1978). Actually, 
standardized tests indicate very little as to how much mathemat- 
ics a student knows (Haertch 198S; McLean, 1982b). They do not 
directly measure how well-educated in mathematics the student is 
becoming. They do n ot identify the cognitive structures of children's 
thinking (Easley, 1974; Piaget, 1929).-'* Some indirect inferences 
can be made by teachers having a good understanding of mathemat- 
ics as subject matter and how students do mathematics, but the 
tests add little to what the teacher already knows. Unfortunately, 
the public is almost invited to make the mistake of concluding that 
the best learners know all that has been taught and the slowest 
learners little. 

When used with an ordinary group of students, that is, a 
heterogenous group from the school’s catchment area, many stan- 
dardized achievement tests do effectively indicate which students 
are the best learners, the ones whose achievement is superior, the 
ones most quickly becoming educated in the core curriculum of 
schools. Middle-range and low-performing students also are con- 
sistently identified. Tests are poor indicators of future performance 
ot poorly moti vated students who subsequently become inspired or 
of zesty students who in midstream lose interest in academics. But 
there are few of either of these; approximately the same ranking of 
students will be found in their courses next year. Test score 
interpretcition oitcu is valid when predicting standing in the same 
or an eijinvaleiit ^roup at a later time. The tests can profitably be 
used to indicate which students probably can handle an accelerated 
or advanced mathematics course. 

With high correlation expected among different mathematics 
tests, test specialists as well as educators and the general public 
have come to expect a good test to indicate attainment of knowl- 
edge as well as student ranking. With reference to the items of 
figure fi.l (shown earlier), these people would expect good perfov- 
manee on Item # 1 to indicate a holding of the knowledge needed to 
do Item #2 and, to a smaller extent, the knowledge needed to do the 
live less similar items. 

It is true that the students who do best on Item #1 will tend 
to be the ones who do best on Items #2 through #7. Rut bow well 
a giv ^ student will do on those six items is not indicated by 
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performance on the first. Performance on the last item does not 
indicate how well a student will do on the first six. [Doing well here 
refers to quality of performance on the task regardless of how other 
students perform.) Items belonging to a topical family are not 
necessarily indicators of how well students will do on other items 
within the family and so, of course, are not indicators of how well 
students will perform on mathematics items generally. 

For a particular group, Item # 1 might be easy. Let us say that 90 
pci ' ent get it right. Even when the items were selected to have a high 
interitem correlation, such information about Item #1 gives us no 
idea about how difficult another item is. Even highly experienced 
teachers make poor estimatesof task difficulty. A group of test items 
does not provide achievement information on tasks not tested. 

Actually, a little more can be deduced. Specialists assembling 
items for standardized tests want some range of difficulty among 
items but not too much. To maximize differentiation and to 
enhance the validity of predictions, they seek items that half the 
students will get right, half will get wrong. Still, they do not want 
to discourage examinees by initial items that are too difficult nor 
to embarrass educators by tests that appear too easy. They try to 
select a majority of items of about the same difficulty, headed by a 
few easy items, ending with a few difficult ones. With this knowl- 
edge, someone examining the test items, when aware of the 
difficulty of a few items, can make some guesses as to the difficulty 
of others. But this serves only as a basis for estimating achievement 
on other items on the test. It is not a basis for estimating whether 
or not the examinees would do well on problems not on the test. 

As indicated previously, student performance on a great range 
of mathematics items is remarkably correlated. The more able 
students tend to find almost all items easier than the less able do. 
A group of students will show achievement at about the same level 
on all mathematics items of similar difficulty. Were there to be an 
inventory of mathematics items having average difficulty equal to 
the difficulty of items on the test, a group of students would 
perform about as well en bloc as they performed on the test. It is 
possible to conceptualize a curricular inventory of mathematics 
achievement selected solely on the basis of a certain mean diffi- 
culty. A teacher noting a mean score on a mathematics test could 
generalize as to achievement on this generic inventory. 

But it would be a fanciful exercise."' If we are genuinely 
interested in the educatit)n of youth, we want to see students 
becoming knowledgeable and skillful as to particular mathemat- 
ics. No domains in mathematics are of absolute value; yet each 
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domain is of more or less value to a robust concept of mathematics 
education. The inventory, however puorly specified, is where 
teaching starts. Our lessons, our textboc^ks, and our tests need to be 
aligned with what we want mathematics education to be, because 
they influence what teachers teach and what learners learn. We 
should not delude ourselves into thinking that domain s identified 
by experts (other than teachers, perhaps) capture the essence of 
mathematics education. Valid interpretations of achievement scores 
can reflect different definitions of mathematics education; teacher 
interpretations will reflect teacher definitions. 

Neither specialists in curriculum nor technologists of testing 
have suitably refined and reported the inventories in their heads, 
much less those in teachers' heads (Archbald Newmann, 1988). 
The categories of items of the standardized mathematics achieve- 
ment test used in the upper sixth grade at Duxbury Intermediate are 
shown in table 5.3. 

The authors go on to identify each item as to number of digits, 
operation, and units (if any), but it is safe to say that they do not 
know how to extend the item map into the rest of mathematics. 
None of us does. None of us commands the language or graphics 
that details the similarities teachers feel between decimals and 
fractions, that shows how teachers draw upon understanding of 
one-step problems to do two-step problems, or that illustrates how 
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teachers use scale conversions in problem solving. We can give 
examples, we can demonstrate, we can show how we would teach, 
but we lack a language to represent tho; e similarities, textures, and 
relationships. Test producers and others have little linguistic 
technology lor sharing with test users their definitions of math- 
ematics content. The users of mathematics achievement tests are 
pretty much on their own to decide what mathematics, other than 
those items actually on the test, is being referred to when we 
conclude that an examinee is a high achiever. 

For a teacher looking closely at the particular items of the test, 
scores provide a rough indication of how well students would 
perform subsequently on similar items. Even the best teachers, 
however, are often wrong in presuming what content is ''similar." 
Sumdardized mathematics achievement tests are pretty ^ood at 
indicating which students arc best and which are poorest at 
learning school mathematics and pretty yood at indicating (for 
those teachers who find it useful to believe in I a ycneral math’ 
ematics ability, but quite pcun at indicating which knowledge and 
skill the students have actually attained. 



ALlCiNMLNT OF CURRICULUM ANU TESTING 

Various writers have noted the difference between the curriculum 
outlined by official goals or syllabi and the curriculum taught by 
teachers (Eraut, Goad, 6^. Smith, 197S; Berlak et al., 1992; Aoki, 
19831. These differences are natural and substantial. Even when 
coordinators and teachers share both the aims and concepts of 
teaching method, there is n() way for coordinators to state precisely 
what the teachers should be teaching. Even when the teaching is 
not very good, the profundity of what goes on in the teaching 
process defies description. Articulating objectives and teaching in 
the classroom are two very different media for defining education. 
In good times, the two provide a dialectic that refines both stating 
aims and teaching. In bad times, they conflict, they embarrass, they 
deceive. The differences, of course, are often more than differences 
in the medium. Official goals and actual practice often project 
different aims. Values, needs, and conceptualizations can differ, 
too. In most efforts to reform education, there is a presumption that 
education would improve if the stated curriculum and the actual 
curriculum were more congruent. 

In the same vein, no textbook perfectly reflects a school's 
official goals. Tlie outline of content for sixth-grade mathematics 
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TABLE S.4. DISTRICT OUTLINE OI CONTENT AND TEXTBOOK 
CHAPTER TITLES FOR A SIXTH GRADE MATHEMATICS CLASS 

DjMru t Outline 

I. Addition and SubtractuMi ot Whole Numbers 
Multiplication and Division ot Whole Numbers 
.L Introduction to Decimals 

4. Multiplication and Division ot Decimals. 

5. Number Theors- 

6. Addition and Subtraction ot Fractions 
Multiplication and Division ot Fractions 

5. Geometry 

9. Percent 

10. Probability 

Liiihlc Six Icxthook Cluipti is' 

1. .Addition and Subtraction ot Whole Numbers 

I. Multiplication and Division ot Whole Numbers 
Decimals 

4. Multiplication and Division ot Decimals 
S Geometrv 

6. Faettus and Multiples 

Additum and Subtraction o\ Fractions 

5. Multiplication and Division tit Fractuais 
^). Probability 

10. Statistics and Cnaphmi; 

I I. Ratio. Proportion, and Percents 
!2. Measurement 

KL iVnmeter. Area, and Volume 

14 Integers 

IS. Using Triangles 

‘These chapter titles were presented in table S.\ 



in Duxhury in 1989 is compared in table S.4 to the sixth grade's 
textbook chapter titles jas seen also in table S, 1 1. The terms in the 
two columns match word for word for many topics. The level of 
detail that district planners and text authors had in mind differs. 
For example, the DuxhiKy curriculum guide also lists thirty-three 
anticipated student outcomes categorized as Knowledge, Skills, 
and Attitudes. Almost nobody expects a closer match than shown 
in table S.4, but it is important to recognize that only the headings 
match; a great body of detail will not match. An example of a 
mismatch in teaching fractions woultl be lor the district outline to 
call fordivisiim of sets into subsets as the dominant representatu»n 
and the textbook to rely almost exclusively on pie charts. A 
mismatch might be troublesome when the district outline calls for 
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two weeks on geometry and the textbook devotes only five pages 
to it. It might be a serious mismatch if the textbook only contained 
exercises on the construction of graphs and district objectives 
emphasized interpretation of graphs. However, the more the cur- 
riculum is in the hands of the teachers, the less we need worry 
about a mismatch in materials. Usually the teachers take the 
discrepancy in stride, indifferent to, sometimes too indifferent to, 
the thrust and boundaries set by both textbook and guide. 

And similarly, testing will not and canr ot cover precisely the 
same ground as textbook, syllabus, and especially, teaching prac- 
tice. In particular teaching practice that follows traditions of local 
control and teacher autonomy will have different thrusts and 
boundaries than standardized tests. Here again, differences in 
language prevent perfect agreement but the differences are likely to 
be greater than the choices in terms. Test authors and teachers can 
be expected to differ in their definitions of achievement. 

At some level of generality, the curriculum being offered and 
achievement testing should share an inventory of mathematics 
content. The word used by researchers (Komoski, no date) in the 
United States to indic.ite the match between the inventory to be 
covered in teaching and the inventory of testing is alignment. It 
seems to most people that teaching and testing should be aligned 
(Freeman et ai., 1983b; Mehrens, 1984; Romberg, 1987^ 1992). This 
is not to say that students should never be tested to determine their 
understanding of mathematics operations and concepts beyond 
those taught in the classroom. There are many opportunities to 
learn mathematics outside the classroom, and many teachers 
exploit them. When the purpose of testing is primarily to increase 
understanding of the extent to which a youngster is becoming 
sophisticated, then the testing inventory should not be limited to 
school mathematics. Bui when the purpose of testing!.^ to increase 
understanding of the attainment of school maihemaiics. the 
curriculum inventory and testing inventory should be aligned. 

Degree of alignment is difficult to measure. As I have already 
indicated, it is easy to point to examples of misalignment. Some 
test items will not only be unlike exercises assigned but depend on 
learning from domains untaught. vSome goals, such as 'The student 
will apply the theory and laws governing our number system as 
known at this leveV'^" are not easily tested. The test, or its 
extension, the total item pool, is a weak representative of the 
voluminous inventory of intended learnings. The formal curricu- 
lum guide, though usually considerably more detailed than an item 
pool, is also an understatement of the inventory of desired math- 
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ematical achievement. Compared to the curriculum guide, the 
coverage of the test will fall short. The fact that the test items relate 
directly to only perhaps 10 percent of the sections in the curricu- 
lum guide indicates primarily that the test is much shorter than the 
guide, not necessarily that there is misalignment between invento- 
ries. Matching is problematic. We have no good way of measuring 
the alignment between tests and curricula.^’ 

Still, it is useful for teachers to compare the two. It will 
become apparent that the test is responsive to some domains and 
not to others. It may also become apparent that the curriculum 
guide fails to include all formats in which problems are presented 
on the test. Usually teachers will decide that students should be 
expected to solve some problems other than those they actually 
worked before. A careful review of achievement test items and 
student performances can broaden and deepen understanding of the 
complexities of education. Testing can contribute to a provocative 
consideration of what we mean by mathematics education. On the 
other hand, a careful review sometimes leads to mindless leaching 
la the tc^t. Alignment usually can be improved by simplifying the 
curriculum. Whether or not high alignment is good needs to be 
decided with reference to our basic definitions of education. 

My assistant, Giordana Rabitti, studied the alignment among 
domains in the curriculum guide, textbook, and standardized test 
for upper sixth grade mathematics in Duxbury. (All of these have 
been outlined in tables 5.3 and 5.4.1 She concluded that a compari- 
son of category headings was insufficient and that a comparison of 
the details required extensive and highly subjective judgments. 
From the documents before her, her conclusions could be no more 
than impressionistic. She found in these materials the potential for 
a high degree of alignment, but recognized that careful attention by 
teachers to guide, text, and test would not assure alignment. The 
curriculum as taught depends on the interpretations of individual 
teachers who probably will continue to vary widely in their peda- 
gogical methods and inventories of content. 

Tc^l Scores tor Redirrc tif\i^ Instruction 

In 1991, as part ol President Bush's call tor an educational revolu- 
tion to bring about better schools, it was claimed that reform 
depemledon knowing what each child knows. And that knowledge 
shouk: be obtained with a national student achievement test. An 
extravagant claim was put forward in 1991 by Secretary of Educa- 
tion Lamar Alexander t(' the el feci that parents have a right to know 
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whether their child understands what is needed as a competitive 
worker for the world marketplace and what he or she will need to 
know to be a scientist in the twenty-first century. It is rj't a matter 
of parent rights. Only the most primitive knowledge needs of today 
are known, much less those of tomorrow. Primitive also is our 
ability to assess what an individual knows, even those who have 
had extensive research study. The knowledge an ordinary child 
possesses remains largely unknown. Luckily, effective education 
systems need only a rough idea of what the learners know. (Those 
ar revolutionary thoughts indeed.) 

Teachers have only a rough idea of what children know, yet 
even this is much more than is to be learned from achievement 
tests. Standardized tests do not inventory what students know. 
They tell us mostly which students respond best to the particular 
questions asked. Standardized test scores represent a reason- 
able basis for predicting which students will do best in future 
scholastic assignments but are not a sound basis for redirecting 
education. 

Many people expect standardized achievement tests to have 
diagnostic properties. State legislation aimed at school reform 
dramatizes the conviction that the tests will point the road to 
improvement. Most teachers arc skeptical. Over the decades, 
research studies have made it clear that teachers find standardized 
test scores of little diagnostic value (Goslin, 1967; Herman 
Dorre-Bremme, 198S; Hotvedt, 1974; Tittle, Kelly-Benjamin, 
Sacks, 19911. The tests seldom inform teachers of previously 
unrecognized student talents and seldom identify deficits in a way 
that directs remedial instruction (Koretz, 19871. Whether referring 
to an individual student, a class, or a curriculum for the entire 
country, stanchirdizcd achicvcnicni tcsi 'i c{)ntributc little to redi- 
recting teaehin^. 

Is this reflective merely of a shortcoming in our teachers? It 
cotiiu be that teachers are failing to see the diagnostic information 
embedded in testing and need coaching. Leslie McLean of the 
C:)ntario Institute for the Study of Education (1982bl found it more 
reasonable to conclude that test data do not fit conceptualizations 
teachers have of education, partly because teacher concept- 
ualizations are more sophisticated. Dale Costello, a British 
Columbia teacher studying another teacher's classroom, agreed 
119881. Teachers increasingly find it useiul to look at tests to 
prepare children to lake tests, but not otherwise to redirect their 
instruction. 
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Why is remedial action not obvious to a teacher? Suppose 
that having taken Form P of the SRA test, a group of sixth graders 
get two-thirds of the computation items correct but two-thirds of 
the pupils miss the two arithmetic-of-fractions items, #23 and 
#26. What should the teacher do? Perhaps these items are more 
difficult; that information is not available. Should the teacher 
allocate additional teaching time to fractions, increasing the risk 
of not getting to the chapter on probability at all? Is computation 
more important than activities calling for complex dialogue or 
engagement in topics of personal interest, neither of which is 
measured by the test? On the basis of experience, calling on some 
intuitive sense of the risks involved, each teacher decides. Ratio- 
nal strategies for remediation are missing; heavy emphasis on 
assessment implies that they should be imposed. National reform 
should be based upon teacher conceptualizations of education 
rather than on student performances on standardized achieve- 
ment tests. 

We should be wary of premature calls for technology, manage- 
ment by objectives, management by statistical indicators, There 
may be a technological breakthrough, a massive connection of 
inventories and pedagogies that relate lest performance, immedi- 
ate instructional tactics, and ultimate educational benefit.s- — some- 
day. Education may someday become a science. So far, both 
experience and research have provided a few guidelines but not a 
technology. We have few formal, systematic answers as to what to 
do about low achievement scores, 

Even the informal systems are weak. We might suppose that 
some educators are so experienced with testing and curriculum 
development that, given the results of extensive test performances, 
they can act as consultants to guide curricular change. An Australian 
re.searcher, Norman Bowman (1979), spent more than a year in the 
Midwest, interviewing, observing in many districts, trying to find — 
wanting very much to find — at least one such expert. He found none. 
It appears to me that testing people are not interested in curricular 
epistemology. Curriculum people who want to participate in mat- 
ters ol testing are usually obliged to communicate in the language ot 
behavioral objectives, competencies, and multiple-choice items, 
Knowledge ot the extensive inventories of mathematics, epistemo- 
logical relationships among domains, and the alternative logics of 
problem solving are seldom topics for the committee on achieve- 
ment testing. There is no science of education as a plaitorm ioi the 
kindot educational reform President Bush sought. 
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Tosr Score Mccins Comparing Classes, 

Schools, States, and Nations 

With strong backgrounds in the social sciences, where compari- 
sons are major stepping stones for theory construction, test devel- 
opers produce data for valid comparisons. As noted earlier, their 
thinking can be categorized in two ways; as either norm referenced 
or criterion referenced (Glaser, l963;Hambleton, Algina,&Coulson, 
19781.^" Although these are directly descriptive of an individual 
examinee, the standardized criterion-referenced tests developed so 
far are primarily intended for the comparison of individuals. 

The comparison of groups has become increasingly common 
in recent years. It showed up early in program evaluation studies as 
posttest scores were compared to pretest scores. Later, a public cry 
for accountability of schools drew attention to the mean score for 
schools or districts with reference to population norms. Then, 
researchers raised the question of comparisons among nations, 
creating the International Education Assessment (MacRury, Nagy, 
ikTraub, 1987) and, in repeated comparisons, found U.S. students 
performing less well in mathematics than students elsewhere 
(McKnight et al., 19871. In 1991, comparisons among states became 
sufficiently political an issue to cause the National Assessment of 
Educational Progress to pilot comparisons in mathematics (Pipho, 
1991). 

The standings in these comparisons can he expected to be 
relatively stable. Low performance of students from the United 
States, from the southern states, and from urban schools are 
repeatable and will extend to other domains of scholastic math- 
ematics other than those included in the tests. The problems of 
interpreting the comparisons are many.^' According to Harvey 
Goldstein of the London Institute of Education, "We arc still a long 
way from being able to prescribe a standard analysis which can he 
adopted routinely to provide definitive school comparions" (1991). 
Differences in means are much more likely to be attributable to 
differences in the student groups tested and the alignment of test 
to teaching than to the quality of teaching. This is not to suggest 
that teaching quality does not vary widely but that little can he 
learned about quality of teaching from student achievement 
testing. 

I'he main point to be made about a com})arison of group 
means follows from the point made at the beginning of this 
section— that test scores can be treated as measurements and as a 
stimulation to thinking. The comparisons of schools, districts. 
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states, or nations, even though stable, do not measure which 
mathematics has been learned or how teaching should be redi- 
rected. The comparisons can be useful for contemplation: How 
could performances be this way? How could things be better? Are 
we putting too much emphasis on geometric proofs? Should we 
compromise on the use of handheld calculators? The answers to 
such questions arc not to be tound in the test scores, but teachers 
who scrutinize the test scores find new ground for pondering the 
questions again. 



TLSTING AS A MANAGfMENT PROCESS 

One of the most powerful realizations in understanding student 
achievement testing in the schools is the distinction between 
testing as an information-gathering procedure and testing as an 
education management procedure. Testing, not just the act itself 
but the requirement for testing, redirects the teaching process. By 
requiring standardized testing, politicians and administrators not 
only make education less a professional field but they change the 
definition of what an education is. 

Traditionally, classroom teaching has been didactic — a mat- 
ter of exposing the students to certain knowledge, activity, and 
attitude, then of having them repeat the information, skill, or rule 
on paper, on a chalkboard, or in recitation (Broudy, 1^631. The 
teacher makes informal assessments of individual progress and 
decides when to move on to the next topic. For example, according 
to researcher Ulf Lu ndgren ( 1 9721, it is common for teachers to note 
particularly the progress of a certain group of students, those 
consistently at the top of the bottom quarter of the class, in 
deciding the pace of the teaching. Teacher decisions are moderated, 
of course, by external demands that certain content, text material, 
or goals be covered during the year. One way of making those 
demands is to require standardized testing, particularly "high 
slakes" testing, in which schooling is placed in ieopardy if test 
scores are not up to standard. 

School reform advocacy in America today is driven by visions 
of tightened management by central administrators (Eisner, 19921. 
To improve teaching and learning performance, most citizens and 
leaders think it necessary to increase cinumitment to uniform 
goals and to awaken teachers to their responsibility to the state. 11 
stale and district officials are to be accountable to their constituen- 
cies, they need to know what is happening — but they have no time 
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to learn. In their view, teachingquality is too complex and obscured 
to be represented in simple indicators but apparently they believe 
learning can be so represented, in the form of mean standardized 
test scores. Thus, school reform appears in public rhetoric to 
require greater centralization of authority and more powerful 
information-processing circuits. 

In a few places, school reform is happening in just the opposite 
fashion. In Chicago and New York City, the movement is toward 
decentralization, toward school-based decision making. In a few 
places across the country, there are efforts to restructure the schools 
to draw more upon the professional responsibility of teachers. But 
such localizing efforts are overwhelmedby the call for core cunicula, 
common goals, and standardized testing. In Sweden and Iceland, 
objections to control by Stockholm and Reykjavik have resulted in 
more support for the teacher, less national specification of instme- 
tion, and less reliance on standardized testing. But the movement in 
the United States and most of the world is toward greater control by 
the government, less honoring of profcvssional experience (particu- 
larly as to subject matter conceptualization by the teachers), and 
more emphasis on formalized student assessment. 

Policy analysts such as Linda Darling-Hammond (19911, 
George Madaus (19911, and Lorrie Shepard (1991) have questioned 
the plan to orient school reform to national testing. They point to 
the small payoff from an enormous investment in state* mandated 
testing. Several recent presidents of the National Council of Mea- 
surement in Education, the organization of reference for most 
educational testing specialists, have expressed strong objection to 
management expectations of testing mandates (Cole, 19cS4; jaeger, 
1987, 1992; Madaus, 198S). 

Standardized testing in education is an important indicator of 
authority and control. Teachers who regularly looked first to their 
experience, then to textbook authors, and last to their professional 
training to guide their instruction now increasingly appear to look to 
standardized testing to decide whether their inventories and their 
conceptualizations of teaching are acceptable (Smith, 1991; Stake, 
1991). The degree to which assessment-driven reform is injuring 
education, or possibly strengthening it, is difHcult to assess. 

rvrerptions ot Tc\i( hvrs as to How Jvstinii Itifkivncc*^ Tv<uhin^ 

There is validity, and there is perception of validity. Invalidity 
increases as tests are misperceived to be valid and unwarranted 
interpretations are drawn. When testing is used primarily as a 
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means of exerting pressure for rigorous teaching and learning, 
resulting actions may hr cither appropriate or inappropriate. The 
more that teachers see pressure as appropriate, and many do, the 
more they are inclined to accept test scores as a valid representation 
of learning. The more they see mandated testing as inappropriate, 
the more they stick to their own conceptualizations of education. 
To understand more of the complexities of test validity, it is 
important to examine teacher perceptions of the appropriateness of 
mandated testing and the changes in contemporary schooling they 
attribute to testing (Hall, Villeme, Phillippy, 1985; Kemmis et 
ah, 1987; Mattsson, 1989). In this section, I will report results of a 
national survey 1 did on mathematics teachers' perceptions of 
standardized testing in their schools. 

The data from this national sample of secondary school math- 
ematics teachers indicate that most teachers, even though they 
seldom use test information to guide instruction, recognized little 
invalidity in standardized test information and found the testing 
little interference to instruction. They found testing still on the 
increase and associated more with the good changes happening in 
their schools than the had. These teacher perceptions may he correct. 

When asked to prepare this chapter on validity, 1 realized that 
1 needed to know more about what was happening in high school 
mathematics classrooms. 1 had been observing elementary school 
classrooms for some time but had not studied mathematics teach- 
ing at the high school level since 1978. To help close the gap, I 
modified a one-page survey form to capture specific concerns raised 
earlier in this chapter. I retained the survey's basic strategy of 
asking teachers what changes were happening in their schools, 
then asking which of those were at least part ial ly attributable to the 
emphasis on testing. 

Colleagues Bernadine Stake and Aminata Soiimare and I 
identified s:ates that had updated lists of mathematics teachers and 
randomly chose twelve,"' then made a random selection of high 
school matnematics teachers from the lists. In the spring of 1990, 
we sent ou; single-page questionnaire to one teacher for each 
100,000 of state population. We indicated that our purpose was to 
observe the impact of standardized mathematics achievement 
tests. We received responses from 186 teachers, a 46 percent 
return, out had to discard 10 because the respondents were 
currently not teaching high school mathematics. .Soumare did the 
statistical analysis. 

On the whole, the 1 76 teaehers told us that the emphasis on 
standardized testing in their school was moderate and getting 
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Stronger; that is, that it was stronger in 1990 than it had been in 
1980, even stronger than it had been in 1987. A third support- 
ive of the increased emphasis but most had mixed feelings."^^ 

Of the 82 percent whose school had administered a standard- 
ized mathematics achievement test within the two previous years, 
the teachers expressed the following opinions on the validity of the 
test for representing how well students knew what they had been 
taught: 45 percent indicated high validity: 35 percent low validity} 
and 19 percent responded that they did not know. In addition, 58 
percent said the test covered content that they had taught; 7 percent 
said it did not. (It is important to keep in mind the 45 percent 
indicating high validity for representing what they had taught.) 

One hundred twenty-five teachers (67 percent) said that over 
time they had changed their teaching (of the primary topic in their 
first class of the day they responded). Of those, 78 percent said the 
change was not because of the testing. Those changing since the 
last time they taught the course (34 percent) said (again by 79 
percent) that the change was not because of the testing. Sharon 
Dennis, a Seattle mathematics teacher, said she changed 'To give 
more background to effect better total understanding." She found 
standardized tests to be having a generally positive effect. 

Wc asked about the usefulness of testing information to a 
teacher taking over a class midterm from a teacher who had 
departed. Only 9 percent said very important, 50 percent said 
somewhat important, but 40 percent said not important at all. 
Finally they were asked how often was it useful to review test 
scores to prepare for a formal conference with parents of a student: 
10 percent said almost always, 48 percent said once in a while, and 
40 percent said almost never. 

About the validity of the test rankings of students on how 
much mathematics they know: 1 1 percent (of those responding) 
found the rankings higl^ Iv similar to their own rankings, 76 percent 
iound considerable sinidarity,i\nd 13 percent very little similarity. 

We asked how well the standardized tests covered the range 
of mathematics each was teaching: 4 percent said extremely well, 
48 percent said pretty well, 33 percent said not very well, and 5 
percent said extremely poorly. 

As to their opinions about school mean scores as an indicaiiim 
ot the quality of teaching provided collectively by the mathematics 
teachers m the school: 1 percent said they were a precise indica- 
tion, 45 percent said they were a rough approximation, and 43 
percent said the test scores were a very poor indication of teaching 
quality. 
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In a summary question, wc asked: "Based on what you see in 
your own school, how is formal standardized resting contributing 
to efforts to improve the quality of education for the youngsters?" 
6 percent said, it helps in many ways-, 39 percent said, generally 
positively; it helps some, but in some ways it hurts-, 1 7 percent said, 
it has no effect; 16 percemt said, more negatively than positively; 
helps some but hurts more; 2 percent said, it hurts in many ways. 

We noticed differences in the response of teachers in different 
types of mathematics courses. As indicated in table 5.S, teachers 
describing themselves primarily as geometry teachers appeared 
considerably more troubled by the effects of standardized testing 
than did other mathematics teachers. Teachers whose careers 
varied in length did not respond differently on the item summa- 
rized in table S.S. Except for the eight teachers who had taught more 
than thirty years, whose median was a favorable to testing + 1 .0, the 
teachers (in five-year experience groups) had medians right at +0.6, 
the grand median. 

The responses summarized in the preceding paragraphs indi- 
cate that the teachers found standardized achievement testing in 
theirschoolsnobigproblein. They found it adequately aligned with 
their teaching. About half of them recognized that the tests only 
partially covered what they were teaching. Most saw some use in 
the test information. In an earlier survey, Paul Theobald and I 
( 1991 1 had used many of the same questions with a larger group of 
elementary school teachers. Their responses were very much the 
same. During that survey, we talked personally to many of the 
respondents and found them reluctant to criticize the tests. For 
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example, several said they were grateful that a wayward colleague 
was forced to bring his or her teaching into line. From many sources 
(Darling-Hammond, 1991; Shepard, 1991; Smith, 1991), we had 
heard that teachers were upset by the intrusions of testing on 
content and schedule, but no more than one out of eight of our 
respondents voiced vigorous protests against testing. They were 
uncritical also of the external definition of curriculum and stan- 
dard setting. They appeared content to teach what outside authori- 
ties v/ere telling them to teach. 

C.hiingcs in ihc Schools Allrihutiiblv to Testing 

We asked the same mathematics teachers about changes taking 
place in their schools, intending later to link some changes to 
testing. Of the twenty-six we suggested, the following were checked 
to indicate the most common e/iunges in instructional conditions 
in these schools. 



We are seeing a gain in emphasis on problem 

solving and critical thinking. S9‘)<, 

Advanced courses in high school math are 

increasingly important here. 49‘/o 

Generally there is a broadening of the math curriculum. 47‘‘<, 

There is increasing clarity as to what is to he taught 
in my math classes. 46 

Teachers are increasingly required to pursue math 
goals defined by the district. 

Attention is increasingly given to differences in 

individual students. 4S“o 

The marginal learner is increasingly the target 

tor deciding the level to teach at. 42‘^i> 

edass time spent prepaiing for tests is increasing. 40‘'o 

Our understanding of how much math our students 

know is increasing. 40‘'<. 

We are increasing the time we spend on teaching 

basic math skills. d9‘\> 



The high school mathematics teachers surveyed indicated 
that the changes occurring in their schools were what most of us 
would consider ''changes for the good." The most frequently 
mentioned change was the new emphasis on problem solving and 
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critical thinking. Almost 40 percent noted a continuing increase of 
emphasis on basic skills. Such an increase is a bit hard to compre- 
hend because that emphasis has been strong in most schools for 
twenty years. It is also a bit hard to interpret the finding that, with 
all of the emphasis on district goals and curriculum specification, 
there is increased attention to individual student differences. It 
might be that more attention is being given to how students score 
on tests than to the different interests and experiences students 
have. An increase in class time spent for preparation for standard- 
ized testing was one of the most frequently noted changes by these 
teachers. 

We thought it interesting to note also the items checked by 
these same 176 mathematics teachers as /css frcufuent cluin^^cs: 

Teachers increasingly utilize those spontaneous 



"'teachable moments.'" 

The quality of our math program is less well 
understood by administrators. 

The quality of our math program is better 
understood by administrators. 

Teachers are increasingly free to pursue the 

math goals they see important. 2o".. 

Teaching by drawing from the teacher's 
personal experience is less common. 



Teaching by drawing from the teacher's 
personal experience is more common. 

We are diminishing the lime we spend on 
teaching the basic math skills. 

Generally there is a narrowing t)f the math 
curriculum. 

There is increasing confusion as to whai is tc^ 



be taught in my math classes. 21 

Allentitm is decreasingly given to differences 
in individual students. 

Advanced courses in high school math are 
becoming less important. 

We arc seeing a drop in emphasis on |U'ohlem 

solving and critical thinking. 1 



The marginal learner is decreasingly the target 
for deciding the level to teach at. 
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Teachers decreasingly utilize those spontaneous 

'"teachable moments/' 14% 

Our understanding of the math our students 

know is actually diminishing. 13% 

Class time spent preparing for tests is diminishing. 12% 

The teachers had numerous opportunities to indicate the 
negative forces at work in their schools but did not. More exactly, 
only 10 to 20 percent of the teachers reported such developments. 
It might be concluded that most teachers saw school reform efforts 
as working. It could, of course, also have been that the teachers did 
not see a need for reform. We did not ask them that. 

One of the worst fears about testing is that it will narrow the 
inventory of teaching to those topics and operations included on 
standardized tcsts.‘^ It is well known that test authors try to avoid 
content not taught in all schools, thus keeping their tests "fair." 
Almost 80 percent of the teachers here rejected the opportunity to 
indicate that the mathematics curriculum was narrowing. 

Relatively stable conditions (not checked pro or con) were 



Utilization of "teachable moments." 52% 

Drawing from the teacher's personal experience. 51 % 

Understanding how much math the students know. 49% 

Class time spent preparing for tests. 48% 

Targeting teaching on needs of the marginal learner. 44% 



These responses suggest that the teachers did not sec them- 
selves as losing control of what happens in the classroom. On 
another matter, half the teachers noted that (even with the stronger 
districtwide and national emphasis on testing, reported earlier) 
they arc not learning more about how much mathematics the 
students know. And on still another matter, unlike the 40 percent 
('f teachers who saw increased time being spent preparing for tests, 
48 percent of the teachers did not indicate such change. 

Now we come to chai\^cs attributed to testinst (by teachers 
noting that change is at least partly attributable to the school's 
emphasis on testing): 

Teachers are increasingly required to pursue 

math goals defined by the district, 8()"o 

We are increasing the time we spend on 
teaching basic math skills. 



79 % 



STANOAROIZFD UST|\C FOR MEASLRI\(; MATHEMATIC'S AC'HIF\ EN\F\T ❖ 211 



Class time spent preparing for tests is increasing. 

Generally there is a narrowirg of the 
math curriculum. 

There is increasing confusion as to what is to 
be taught in my math classes. 



64% 



73% 



70% 



It is important to recognize that these testing-attribution 
percentages are based on teachers reporting the change. In the first 
of the five changes (regarding district goals), 83 of the 176 teachers 
reported increased requirements; 66 of those 83 (80 percent) attrib- 
uted the change at least in part to the emphasis on testing. This does 
not represent a majority of all teachers who claimed testing is 
having such an effect; it does represent a majority of those reporting 
increased emphasis on district mathematics goals saying so. 

According to a similar minority of teachers, testing is also 
contributing to the emphasis on basic skills and the narrowing of 
the curriculum. There is even a perception that testing increases 
the confusion as to what is to be taught. 

The curriculum is constantly changing, and many different 
social forces and bodies contribute to that change (Freeman et al., 
1983b; Saylor, 1982). Often the changes are not planned, biit arc 
reactive. Increasing criticism in the news media and the concern of 
employers about the competence of graduates have caused teachers 
to narrow the range of learnings for which they are responsible. 
Seldom are philosophers and educational researchers mentioned as 
authorities on what education should be. Changes in mathematics 
curricula occur increasingly in response to public and political 
expression. 

Many people hope that clarity about education will come 
with increased standardized testing. For more than fifteen years, 
the states have been mandating achievement testing not so much 
because it provides useful information but because it accelerates 
change in educational management (Popkewitz, 1981; Class ik 
Ellwein, 19861. Testing is supposed to set the classroom teacher 
straight. I am confident the curriculum has been affected but have 
found it difficult to measure the amount and direction of change 
(Stake Theobald, 1991). Whether siaiulardizcd test scores are 
going up or going down has little to tell us about what is happening 
to education. 

During the survey just described, the attitudes of teachers 
appeared to me blase. The teachers acknowledged that the inven- 
tory of mathematics tested was much smaller than the inventory 
of mathematics to be taught, yet they did not express concern that 
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these achievement tests provided a limited view of what their 
students were achieving and not achieving. They seemed to believe 
that achievement testing was valid if it identified a generalized 
gradation of mathematics performance among students, thus indi- 
cating the ''mathematics aptitude" of each student. They had not 
come to expect the tests to indicate whether or not the students had 
actually achieved the many curricular components that they, as 
they taught, treated as separate and important. They were not upset 
by such test invalidity; they had come to accept it. Therefore, these 
mathematics teachers, at least the 45 percent who considered the 
tests as valid and important, were increasingly agreeable to one of 
the main administrative pu.'poses of testing, to change the views of 
the teachers as to what should be taught. 



a)NC:LlJS!ON 

We would like our young people, at least in a few areas, to 
experience the richness and depth of mathematics (de Lange, 
1989; Haertel Calfee, 1983; Romberg, Zarinnia, Collis, 
1989). Instead, as we move toward the challenges of the new^ 
century, the American mathematics curriculum has little depth 
compared to that offered in schools in other industrialized 
nations (McKnight et al., 1987). Though often well intended, 
political and technical pressures to specify, standardize, and 
assess student learning appear to drive the curriculum further 
toward the shalhiws. Many mathematics teachers do not recog- 
nize the drift toward oversimplification. 

It has not helped to upgrade education to specify academic 
skills and curricular topics as standards for all to master. There are 
other roads to reform,*' some that offer increased opportunity for 
children to experience intellectual problems, to voice perplexity, 
and to propose explanation. Many of us see it as essential that 
individual children be helped to relate their studies to personalized 
{u/3common) experience. In trying to raise standards, state school 
reform efforts have relied excessively on common goals and com- 
mon test performance. We could not do without common aspira- 
tions and expectations, but there is a profound need \ov unique 
teaehmg and alUiwance for personal interpretation by each child. 
Many of our teachers are capable of providing it and do. The 
overemphasis on common goals diverts their efforts. 

My Held studies of American classrooms (Stake ^ Easley, 
1978; Stake et al., 1986; Stake, Bresler, ^ Mabry, 1991 ) indicate that 
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the American teacher remains a niajor asset, not as capable as we 
would like, not all that children deserve, but largely pleasing to the 
local community and school authorities, more the artist and even 
more the technician than reformist agitation suggests. Most teach- 
ers have heard the calls for reform, are sympathetic to them, have 
helped initiate some, and are hopeful of contributing to improved 
student assessment. Many arc troubled when instructional time is 
diverted to preparation for testing. Most do not see that mandated 
assessment already is changing the nature of education in America. 

Education is being redefined. Though the phenomenon is 
difficult to measure, standardized testing — intentionally, with a 
noticeable effect that is often harmful — does change education 
(Haney Madaus, 1986; Shepard, 1991; Smith, 1991; B. Wilson ik 
Corbett, 1 99 1 1. As detailed in the previous section, teachers report 
that with increased testing and curriculum standardization, they 
attend more to the so-called basics (the most elementary knowl- 
edge and skills) anJ attend less to achieving deep understanding on 
the part of their students of even a few topics. According to George 
Madaus (19911, the dangers in current school reform are several: 
overstandardization, oversimplification, overreliance on statis- 
tics, student boredom, increased numbers of dropouts, a sacrifice of 
personal understanding, and probably, a dimunition of diversity in 
intellectual development. Further emphasis on standardized test- 
ing increases the risk. 

Traditionally, education has been a niatter of understanding 
based on knowledge, with each person's knowledge and under- 
standing different because of the impossibility (and undesirability) 
of completely shared experience. To deal with the complexities of 
education for both the few and the masses, not only schools but 
school systems were created. Just as the budget of schools is high 
on the list of social costs, the management of schools is one of the 
most comprehensive of collective endeavors. As the authority and 
accountability of the schools are challenged, school officials look 
for additional ways of exerting control, not only over learners hut 
over teachers, parents, and taxpayers. The mathematics teachers 
surveyed for this monograph confirmed that testing is an instru- 
ment of management. Those who control the tests have much 
to say about the definition of learning, teacliing, and education 
(Darling-Hammond ^ Wise, 198S). 

The ostensible purpose cd' achievement testing is to measure 
the learning ol students. T(' an extent, this occurs. In measuring 
mathematics learning, stable rankings among students are ob- 
tained. These rankings are not very different from those that 
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teachers have generated informally and more extensively on the 
basis of classroom observations and assignments. Test scores 
confirm and authenticate pedagogical assessments. As to what 
mathematical knowlege has been attained, however, standardized 
achievement tests provide very little valid information. 

Test scores are a stable ground for comparing schools and 
nations; unfortunately, those comparisons are often distractive, 
sometimes pernicious. As illustrated in this chapter, they turn 
teachers and administrators to a lesser task. Comparisons make 
some deficiencies public but only on the rarest occasion do they 
show weaknesses not already recognized by teachers and represen- 
tatives of the public. Seldom do they provide insight or diagnostic 
remedies for the deficiency. Nor arc achievement test scores 
indicators of the quality of teaching. 

To a certain extent, standardized achievement tests should be 
aligned with the curriculum as planned and taught. The tests 
should be in harmony with the expectations of parents and the 
state. Obviously, there is no way to match fully these at least 
somewhat disparate obligations and expectations. We may some- 
day improve the technology of representing curricular priorities, 
recognizing with precision what different people want teaching to 
be, but any reduction in disparity is more likely to be a matter of 
oversimplying wants than of drawing teachers and others into 
consensus. 

An important reason for the lack of alignment between 
mathematics teaching and standardized achievement testing is the 
brevity of the tests. Contrary to popular and technical opinion, the 
items on the test do not nicely represent classroom teachings, even 
though test items and an abundance of classroom exercises fall 
within the same goal statement or are classified in the same topical 
category of mathematics. But the ma^i reason for lack of alignment 
is that the inventory of mathematics taught by good teachers and 
poor teachers alike is hundreds of times greater in detail than the 
mathematics on the test. 

Because of the high correlation among student performances 
on examinations, the test scores provide a stable indicator of some 
generic notion of mathematics ability. Based on test scores, the 
predictions that mathematics teachers make about performance in 
subsequent academic situations will often he valid. The assess- 
ment that teachers make (by looking at standardized achievement 
test scores) about the mathematics that students have learned will 
seldom he valid. In that sense, there is a fundamental invalidity to 
standardized testing in mathematics. 
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2S. Now look back at the check marks again. We p— ^ 26. Now pick the one condition in your school for 

want to identify the one positive condition in that = which testing is making the most negative contri- 

list of statements that you see as the most impor- button. Put the number for that statement in this 
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NOTES 



1 . Extensive survey data will be presented in the third section of this 
chapter. 

2. The current NCTM Standards (1989) are part of a long-running; 
campaign by mathematics teacher educators to get teachers to conceptu- 
alize less according to topical content and more according to problem 
solving and experiential learning [Carl, 1991; Biggs & Collis, 19911. The 
purpose of this chapter is not to argue for one or the other but to examine 
test validity in terms of teacher and test-developer conceptualizations of 
mathematics achievement. 

T. In speaking of the vast and detailed content that mathematics 
teachers bring to the classroom, I do not mean to say that as a group they 
put content learning higher than other learning. Clearly, teachers differ. 
My own acquaintance with mathematics teachers isni( ely reflected in the 
work of sociologist Robert Connell |1985|, who found that teachers tend 
to preferone of four cm phases: intellectual growth, personal development, 
skill learning, and honoring custom. Those holding intellect in highest 
esteem take special pains in choosing content to teach but, whether 
articulated or not and whether sophisticated or not, all teachers have 
elaborate conceptualizations of subject matter. 

4. Such discourse is at the heart of some definitions of teaching 
Speaking of the teacher, Sylvia Ashton -Warner [19671 said: "Prom the 
teacher's end it boils down to whether or not she is a good conversation- 
alist; whether or not she has the gift or the wisdom to listen to another; the 
ability to draw out and preserve that other's line of thought." 

5. Artificial means that it is a construction of human interpretation 
and judgment, not a direct inventory or an objectively derived calculation 
from direct measurements. Mathematics achievement as a construct is 
aniHcial because is alludes to a body of material only vaguely specified and 
laigely intuited. Artificial constructs are central to all science. Such 
constructs as "energy" and "susceptibility to disease" are sometimes 
objectively dehned but are used intuitively and practically by scientists 
and others. Tlv: value of artificial constructs to practitioners depends on 
how well rooted the concept is in action and discourse. 

6. Science, particularly inductive science, is built upon constructs, 
aggregating through relationships into theta ies. T esting researchers such 
as lames Pophatn [ 1 987) and Edward Haertel and David Wiley [19901 speak 
of domains, traits, and achievements with the confidence that these 
constructs interchange with the constructs educators develop from expe- 
rience. The more artificial the construct, the greater is the need for 
validation, by researcher and educator alike. 

7. T he emphasis here is on knowledge. Most psychologists prefer to 
ideniilv the selection ol mathematics learned as made ui'> of abilitic's oi 
competencies. Haertel and Wiley, for example, said, 'Mn this paper the term 
'ability' encompasses that which is commonly classified as kmnv ledge and 
skill" [19901. Such terms draw one toward a concept of education as a 
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collection of skills and away from thinking of education as understanding 
of knowledge. Both are part of education hut the two deh nit ions move 
thinking about education in different directions. 

8. To appreciate the complexity and lack of interdependence, one 
might think of an inventory of mathematics the same way one thinks c)f 
physical surfaces of land mass. Two-dimensional space could represent 
knowledge and skill and the elevation could represent conceptual attain- 
ment by an individual or group. Tc) each square kikuneter, we eould assign 
a learning task. The entire plot might cover a territory as large as a country. 
For one person, achievement across tasks might be as irregular as the 
terrain of Switzerland: for another it might be as flat as Holland. Predic- 
tions of ground elevatic)n from one part of tlie ct)untry to another would he 
risky. One cannot indicate ground elevation of a base camp on the 
Matterhorn from knowledge of evaluation of the railway station in Zurich. 
And one does not have a very good idea of the elevation of all of Switzerland 
by sampling elevation at thirty points. Elevation is similar for nearby tasks 
but attainment t)f distant tasks is unpredictable. 

9. False precision is regularly implied by teachers who grade in 
percents, implying that 100 percent correct refers to a totality meaningful 
beyond the items actually administered. 

10. Decades ago, most psychometricians abandoned testing ft)r "'in- 
telligence'" and reconeeptualized their target as "scholastic aptitude." Still 
a form of intellectual power, scholastic aptitude indicates predictable 
relative achievement on common classroom assignments. Mathematics 
ability is a specific schc)lastic aptitude. 

11. People will disagree as to who the delinquent teachers are. 
Shortcomings in subject matter competence, behavior control, punctual- 
ity, dress, and test-score production, any one, or more, can be excused by 
some people when other tjualiheations run strong. Evaluation of teacher 
merit is not just a measurement problem; it is confounded by ideological 
diversity in the school and in the community (Simc)ns Elliot, 1989; 
Stiggins Duke, 1988). 

12. Meunent by moment, through the day. many students persist in 
the view that there is little of importance in what the teachers are teaching 
today. "It it turns out to be important, I can get it later." The indignation 
of many parents about the sehools, often with cause, feeds the youngsters' 
resistance to being taught. 

1.^. I note also the shortcomings oi educatitmal researchers, but to 
list them in this sentence would impl>' that they influence what happens 
in schools. 

14. Carl Here iter (199 11 has given us opportunity to rethink the 
question of how teacliers can be effective eontrihutors to student achieve- 
ment even when thev cannot vocalize the rules bv which thev teach. 

I According to novelist Robeitson Davies, the ability to withhold 
authorilv and correction is a key talent for teachers. In The RcJn^l Any,cJs 
(1981 , p. he said: "Only those who have ne\er tried it fora week oi two 
can suppose that the pursuit of knmvledge does not demand a strength and 
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determination, a resolve not to be beaten, that is a special kind of energy, 
and those who lack it or have it only in small store will never be scholars 
orteachers, because real teaching demands energy as well. To instruct calls 
for energy, and to remain almost silent, but watchful and helpful, while 
students instruct themselves, calls for even greater energy. To see some- 
one fall (which will teach him not to fall again! when a word from you 
would keep him on his feet but ignorant of an important danger, is one of 
the tasks of the teacher that calls for special energy, because holding in is 
more demanding than crying out.'* 

16. Research on the automation of education has been summarized 
by Roy Pea and Elliot Solaway (1987). 

17. Swedish researcher Ulf Lundgren (1972) conceptualized the con- 
ditions of instruction monitored by teachers. Seymour Sarason (1971) 
wrote cogently on informal assessment of social conditions in the class- 
room. John Goodlad wrote about the need for teachers to recognize the 
quality of student work (1990). 

1 8. Reflective teaching has been described by Donald Schon (1982), by 
Peter Grimmett and Gaalen Erickson (1988), and by Max van Manen 
(1991). 

19. Some scholars also call formore emphasis on a "core" curriculum. 
See Fenstermacher and Goodlad (1983). 

20. To speak of teaching as an art is not to treat it as casual, contrived, 
or without standards. Madeleine Grumet ( 1 988, p. 1 28) notes, "If we think 
of teaching as an art, then we have a responsibility to he critic as well as 
artist. To teach as an art would require us to study the transferences we 
bring to the world we know, to build our pedagogies not only around our 
feeling for what we know but also around our knowledge of why and how 
we have come to feel the way we do about what we teach." 

21. Speaking of mathematics education in 1967, Robert Davis said, 
"we live in an age when the best practice of the best practitioners almost 
certainly lies ahead of the best theory of the best theorists" (p. 59). 

22. Figures .S.3, 5.4, and .5. .5 are not research findings, they were 
drawn from my impressions of what experienced teachers do, not directly 
from observational data. When asked, the teachers seldom claim to he 
involved in such detailed analyses. And yet the effects of such elahorat<‘ 
selection of content can be observed. The point is that the intuitive 
working of teachers is highly complex, with far greater texture than the 
goals stated in table 5.2. 

23. 'Fhe learning terrain for each child, of course, i.s different. 

24. Some of the best works to date are Rosalind Driver, 1973; Boh 
Godwin, 1990; D. H. )onassen, 1982; Takahiro Sato, 1991; School Math- 
ematics Study CiToup, 1961. These works analyze either instruction, 
epistemolog.v or cognitive development; they do not adapt nicely to the 
"conversational" exchanges of the American classroom. 

25. Other than to note Easley's statement, "it seems absurd to pretend 
that one knows how to measure cognitive competences by administering 
standardized lists of questions when no validating clinical interviews- 
and certainly no explicit structural analyses— have been published" (1974, 
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p. 28 n. I will not use this chapter on validity to examine the inability of 
achievement tests to represent student mathematical thinking. Here I 
concentrate on the disparity between the inventory of achievement 
conceptualized by mathematics teachers and the collection of mathemat- 
ics aptitude items pooled by test makers. 

26. Interpreting performance with reference to how other examinees 
perform, usually using "percentile ranks," is called norm referencing. The 
sophistication of test technology for norm referencingis very high. But, as 
Robert Glaser il963!, Jason Millman {1974}, James Popham (1980), and 
many educational researchers have said, instruction needs "criterion 
referencing," interpretation with strong reference to the content of the 
task. With course content much more rooted in teachers' informal 
conceptualization than in formal epistemological analysis, the sophistica- 
tion of c iterion -referenced test technology is not very high. 

27. Because item difficulties for individual persons are much less 
stable, this reasoning makes even less sense for interpreting an individual's 
score. 



28. More than in any other subject-matter field, curriculum develop- 
ers in mathematics have classified content, problems, and skills as to both 
what might be taught and what should he taught. In their communication, 
district supervisors and curriculum committees have tended to use only 
the broad headings (roughly equivalent to textbook chapter titles), classi- 
fications that are much less detailed and without the interdependencies 
that characterize teacher conceptualizations of what needs to be taught 
iBaker, 1989; Darling-Hammond, 1990k 

29. In an unpublished note, Lawrence Stenhouse w'rote, "Good teach- 
ers are necessarily autonomous in professional judgment. They do not 
need to be told what to do. They are not professionally the dependents of 
researchers or superintendents, of innovatorsor supervisors. Thisdt'jcs not 
mean that thev do not welcome access to ideas created by other people at 
other places oi in other times. Nor do they reject advice, consultancy or 
support. But they do know that ideas and people are not of much real use 
until they are digested to the point where they are subject to the teacher's 
own iLidgmcnt. In sliort, it is the task of all educationalists outside the 
classroom to serve the teachers; for only the teachers are in the position to 
create good teaching." 

30. Which was one of the "anticipated student outcomes" for sixth 
graders in Duxbury. 

3 1 . Ken Komoski ot the LIME Institute has been a pioneer in dewelop- 
ing an alignment technology. 

32. L(H)k at the Vietnam war. I'he deceit of waging war by strategic 
use t)f statistical indicators was illustrated by Neil Sheehan (1988) in his 
biograph V of Colonel John Vann. Look at Detroit. According to David 
1 lalberstam (1986), the loss of American donviiance in the autonu'bile 
market came about when economists and bankers replaced car makers as 
chief executive officers. Although spokespersons for businessand industry 
have been strong allies of reform in American schools, some observers of 
the workplace are urging that worker empowerment, a "real" conceptual 

O () 
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role for workers, increases productivity and corporate health {Peters ^ 
Waterman, 1984), Effective teacher-proof management of course content 
is a pipe dream, 

33. One of the finest efforts toward a science of education occurred in 
the 1970s when Lee Cronbach( 1977) and Richard Snow (Snow 6i.Madinach, 

1 99 1 1 of Stanford attempted to pin down relationships among aptitudes (as 
indicated on tests) and pedagogical strategies. For example, did certain 
children learn better through practical application whereas others got 
more from abstract explanations? The findings from that work, unfortu- 
nately, warranted little teacher study or technical investment. 

34. See P, Tres-Brevig, 1993. 

35. A norm -referenced orientation to test development creates a 
population of examinees whose distribution of scores provide rankings for 
interpreting each score. The commonly used standardized achievement 
tests are norm -referenced tests. A criterion -referenced orientation to test 
development emphasizes individual performances on individual test items 
selected hecau.'^c, standing alone, that performance is worth pondering. 
Writing a check a bank would cash, writing an essay, and assembling an 
engine are such performances. 

36. Fora close look at the validity of school means, see Madaus et al,, 
1979. More recently, officials at even the poorest schools found that using 
the same tests year after year, omitting students with learning deficien- 
cies, and coaching for the tests resulted in district scores above the national 
median (Linn, Graue, and Sanders, 1990). 

37. "Comparisons make sense only when they are put in the context 
of the entire character of the species concerned and of the known principles 
governing resemblances between species" (Midgley, 1978, p. 24). For our 
purposes, substitute the word schools or nations, for species. 

38. The modified instrument and teacher responses are shown in the 
appendix. Also see Stake Theobald (1991 ). 

39. Arizona, Connecticut, Idaho, Illinois, Kentucky, Nebraska, 
New York, South Carolina, Washington, Wisconsin, West Virginia, and 
Wyoming. 

40. Hoping for a much higher return rate, we had kept the questionnaire 
short and had urged teachers to respond to and send back even a part of it. 
Having allowed anonymous responses, we could not thereft)re send follow- 
up reminders. On the summary item indicated in table 5.5, we calculated the 
median for those responding in the first three weeks as +0.79, in the second 
three weeks as +0.46, and for the remainder as +0.56. With these medians, 
it is reasonable to believe that those who did not respond at all might have 
been a little more opposed to testing than our respondents were. 

41 . Response options checked arc shown here in italics. 

42. Thomas Hastings, Philip Run kel, and Dora Damrin(l 961 ) advised 
cautiousness in reading teacher's descriptions of test use. Their survey 
responses indicated a respectable utility, but in probing during personal 
interviews, they found the testing to be legitimation for decisions more 
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than important information for decision making. See also the work of Paul 
LeMahicu (1984) and Romberg, Zarinnia, and Williams (1989). 

43. This and the following list consist of a total of twenty -six items 
(o each of which teachers could respond or not. Therefore, percentages in 
the columns will not sum to 100 percent, A third list indicates those issues 
not checked; that is, considered not to be changing. 

44. Teachers have long been dubious about claims for elevating 
student achievement via stronger control from central administrators. In 
1981, the state of Florida had perhaps the most aggressive state testing 
program in the nation. State Superintendent Turlington repeatedly indi- 
cated that Florida teachers were solidly behind the testing program. As a 
national evaluation team that happened to he studying sex equity educa- 
tion in the nation's tenth largest school district, the Broward County 
schools, we asked a 15 percent sample of teachers: "In this district's 
schools, how much are the following interfering with students getting a 
good education: (a) racial discrimination; (hi discrimination according to 
sex; (c) bilingual problems; (dl overemphasis on testing." About half the 
teachers indicated that testing was an interference, more so than racial 
discrimination, gender discrimination, or bilingualism. (Reported in Stake, 
Stake, Morgan, 6;. Pearsol, 1983, p. 221.) 

45. It is not America alone that is unhappy with its schools. We 
should not he reluctant to look at how (in the early 1990s' educational 
authorities around the world are reforming education. Distressed by strict 
controls from Stockholm, the Swedish people moved Parliament to dis- 
miss the 800-person National Board of Education and replace it with an 
agency for supporting local educators. The Ministry of Education of 
Victoria, Australia, has created a system for deciding who shall go to 
college, using teacher assessment of standardized projects and portfolios. 
The United Kingdom has piloted "standard assessment tasks," but finds 
the load on teachers for marking excessive. The Province of Ontario 
continues to revise its cLirriculiim along lines supported by teacher unions 
without reliance on state or federal testing. Some ministries seek to draw 
from science and technology without undermining existing pedagogical 
arts; others do not. 

46. The words of Andrew Porter are instructive: "Simply telling 
teachers what to do is not likely to have the desired results. Neither is 
leaving teachers alone to pursue their own predilections. But it might be 
possible to shift external standard setting away from relianee on rewards 
and sanctions (power) and toward reliance on authority. One approach to 
building authoritative standards would be to involve teachers seriously in 
the business ot setting standards. Through the process of teacher partici- 
pation, the standards would take on authoricy" (1989, p. 354). The way to 
involve teachers seriously is to observe what they do rather than ask them 
what teachers should do. (See also Daniel Koretz, 1987; Ann Lieberman. 
1988; National Council of Teachers of Mathematics, 1989; and Harry 
Torrance, in press.) 
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6 ❖ Assessment Nets: An Alternative Approach 
to Assessment in Mathematics Achievement 

Mark Wilson 



New views of learning have implications for the monitoring and 
assessment of student learning (Webb Romberg, 1992). They 
suggest that we focus on measuring the understanding and models 
that individual students construct during their learning process 
(Masters 8\ Mislevy, 1992; Wilson, 1992b). In many areas of 
learning, and in mathematics in particular, levels of achievement 
may best be defined and measured not in terms of the number of 
facts and procedures that a student can reproduce (i.e., test score as 
counts of correct items), but in terms of the best estimates of his or 
her level of understanding of its key concepts and principles 
(Masters, Adams, Wilson, 1990; Wolf, Bixby, Glenn, Gardner, 
1991). Moreover, we may need to estimate these levels using 
several types of information from a single complex performance 
(e.g., levels of correctness, strategy-use, latency), and we may need 
to incorporate several perspectives on the types of information 
(e.g., the student's perspective, the teacher's perspective, an expert 
opinion). Obtaining these types of information and perspectives on 
the information will require new approaches to assessment. 

This chapter describes an a^^essment net that is an alterna- 
tive form of assessment for mathematics education. An assessment 
net is composed of ( 1 ) a framework for describing and reporting the 
level of student performance, (2) a means of gathering information 
based on observational practices that are consistent with both the 
educational variables to be measured and the context in which that 
measurement is to take place, and (3) a measurement model that 
provides for appropriate forms of quality control. 

To date, work in the area of performance assessment has 
addressed only one portion of an "assessment system'' (Linn, 
Baker, ik Dunbar, 1991), observational design, and emphasized 
instructional validity (Wolf, Bixby, Glenn, ik Gardner, 1991). For 
example, a recent issue of a journal concerned with measurement 
in education was devoted to performance assessment (Stiggins ik 
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Plake, 19911, yet only one article dealt substantively with issues 
other than information gathering. 

At this time we do not have a comprehensive methodology for 
performance assessment. The complexity of performance assess- 
ment appears to challenge both the philosophical foundations 
(Shepard, 1991 ) and the technology (i.e., the measurement models! 
of standard educational and psychological measurement. In con- 
trast, the rival of performance assessment, standardized multiple- 
choice testing, appears to many to be part of a coherent system of 
assessment (APA, AERA, NCME, 1985) that ensures quality 
control by addressing item test construction, pilot testing, reliabil- 
ity, validity, and reporting schemes. The aim of the assessment net 
is to build a system that has the coherence of the traditional testing 
approaches but addresses new issues brought forward by the perfor- 
mance assessment movement. 



FRAMEWORK 

The assessment net begins with the idea that what we want to 
assess is student progression along the strands of the curriculum. 
This progression must reflect a shared understanding on the part of 
the users of the net. That understanding must include a notion of 
progression, an agreed-upon set of important strands, an agreed- 
upon set of levels of performance along the strands, and an accep- 
tance that this progression is a tendency but not an absolute rule. 
A framework for a particular curriculum area defines levels of 
performance that students would be expected to achieve. The 
levels extend from lower, more elementary knowledge, under- 
standing, and skills to more advanced ones. They describe under- 
standing in terms of qualitatively distinguishable performances 
along the strands or continuum. 

The idea of a framework is mn new. Related notions have been 
developed in many parts of the world: the Western Australia First 
Steps project (Ministry of Education, 1991), the Australia National 
Curriculum Profiles (Australia Education Council, 19921, and the 
UK National Curriculum strands (Department of Education and 
Science, 1987a, 1987bl. The California Framework in mathematics 
(California State Department of Education, 198.51 is composed of 
strands or continua in number, measuiemenl, geometry, patterns 
and functions, statistics and probability, and algebra. Within each 
of these strands, four broad levels of performance arc defined: (11 
kindergarten to grade (21 grades d to 6, (dl grades 6 to 8, and 
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(4) grades 9 to 1 2. A list of goals is defined within each strand for each 
level. For example, within the geometry strand, at the lowest level, 
one of the goals is "Use visual attributes and concrete materials to 
identify, classify, and describe common geometric figures and mod- 
els, such as rectangles, squares, triangles, circles, cubes, and spheres. 
Use correct vocabulary" (California State Department of Education, 
1985, p. 24). At the grades 3-6 level, one of the goals is "Use 
protractor, compass, and straightedge to draw and measure angles 
and for other constructio;*s" (California State Department of Educa- 
tion, 1985, p. 27). And at the grades 6-8 level, one of the goals is 
"Describe relationships between figures (congruent, similar) and 
perform transformations (rotations, reflections, translations, and 
dilations)" (California State Department of Education, 1985, p. 32). 
Each level is associated with a set of special concerns and emphases 
for that particular period of schooling, such as, at the base level, an 
emphasis on concrete materials and classification. 

The Vermont statewide assessment for fourth and eighth 
grade students includes standard tests and portfolios in mathemat- 
ics and writing. Students' mathematics portfolios are rated on the 
seven criteria shown in figure 6. 1 : understanding of task, quality of 
approaches and procedures, decisions along the way, outcomes of 
activities, language of mathematics, mathematical representa- 
tions, and clarity of presentation (Vermont Depanment of Educa- 
tion, 1991). 

The mathematics projects used in statewide assessment in 
Victoria, Australia, are a third example. Students in the final two 
years of high school undertake studies for a certificate issued by the 
Victoria Curriculum and Assessment Board. To satisfy the require- 
ments for the certificate, students complete twenty-four half-year 
units chosen from forty-four available areas of study. All students 
must complete a specified number of units in English, arts- 
humanities, and mathematics-science-technology. 

Students who take mathematics are required to complete a 
series of investigative projects, each involving at least seven hours 
of classwork. One of these projects, completed during the first half 
of the final year, is based on a theme set annually by the board. 
Teachers monitor and record the progress of each student's project, 
much of which must be completed during class time. The project 
report is submitted by a date specified by the board. The task is 
described as follows: "Students undertake an independent math- 
ematical investigation based on a single theme set annually by the 
state curriculum and assessment board. Students have four weeks 
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1. 1 ndprstandiny of Task 

Sources of Evidence: TAplanatiun of task; reasonableness of approach; correctness of response 
leading to inference of understanding. 

3 (renerali/.ed, applied, extended. 

2 L'ndcrstood. 

t PartialU Understood. 

0 Totally Misunderstood. 

2. Quality of .XDproachc&TrQCcdurcs 

Sources of Evidence: Demonstrations: descriptions (oral or written): drafts, scratch work, etc. 

3 Efficient or sophisticated approach procedure. 

2 Workable approach'procedure. 

1 Appropriate approach/procedure some of the time. 

0 Inappropriate or unworkable approach/procedure. 

3. Decisions Along the W a\ 

Sources of Evidence: (‘hanges in approach; explanations (oral ur writteni; validation of final 
Nulutiun: demonstration. 

3 Reasoned decisions. adjustments shown explicated. 

2 Reasoned decisions.'adjustmcnts inferred with certainty. 

1 Reasoned decision making possible. 

0 No evidence of reasoned decision making. 

4. Outcomes of Activities 

Sources of Evidence: Solutions: extensions — observations, connections, applications, sv ntheses, 
geiieraliAations, abstractions. 

3 Solution with sy nthesis, generuli/ation, ur abstraction. 

2 Solution with connections or applicatinntsl. 

1 Solution with observations. 

0 Solution with extensions. 

5. I.anyuaee of Mathematics 

Sources of Evidence: Terminology ; notatiun^y mbols. 

3 Use of rich, precise, elegant, appropriate mathematical language. 

2 Appropriate use of mathematical language most uf the time. 

1 Appropriate use uf mathematical language some of the time. 

0 No ur inappropriate use uf mathematical language. 

6. Mathematical Representations 

Sources of Evidence: (iraphs, tables, charts; models: diagrams: inanipulatives, 

3 Perceptive use of mathematical rcprcsentutiun(s). 

2 Accurate and appropriate use uf mathematical rcpresentation(s), 

1 L se uf mathematical representation (si. 

0 No use of mathematical representationts). 

7. Clarity of Presentation 

Sources of Evident t : Audio v idei) tapes ( transcript si: w ritten work; teacher inter v icw s 
observations: Journal entries: student cotmnents on cover sheet: student s4.*lf-assi‘sstnent. 

3 Clear le.g.. welUorgani/ed. complete, detailed). 

2 Mostly clear. 

t Some clear parts. 

0 I nrlcur (e.g.. disorgant/ed. incomplete, lucking detuill. 



Figure 6. 1 . Criteria for mathematics portfolio. 
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to develop a project topic based on the theme, collect data, and 
submit a written report. The project is expected to take between 1 5 
and 20 hours, with 7 to 10 hours durin^ class time. Each student 
submits a written report of about 1,500 words emphasizing the 
mathematical aspects and results of the project. This task is 
undertaken in the middle of the school year. An initial grade is 
assigned by the chool and assessments are subject to the verifica- 
tion procedures of the Board'" (Victoria Curriculum and Assess- 
ment Board, 1990). 

Initial assessments of students" projects are completed by 
classroom teachers. To achieve comparability for the award of 
project grades across schools throughout the state, all teachers are 
provided with an assessment sheet listing a set of eighteen criteria: 

(a) Conducting the investigation 

1. Identifying important information 

2. Collecting appropriate information 

3. Analyzing information 

4. Interpreting and critically evaluating results 

5. Working logically 

6. Broadening or deepening the investigation 

(b) Mathematical content 

7. Mathematical formulation or interpretation of problem 
situation or issue 

8. Relevance of mathematics used 

9. Level of mathematics used 

10. Mathematical language, symbols, and conventions used 

11. Understanding, interpreting, and evaluating the math- 
ematics used 

12. Accuracy of mathematics used 

(c) Communication 

13. Clarity of aims of project 

14. Relation of project topic to theme 

15. Definitions of mathematical symbols used 

16. Account of investigation and conclusions 

17. Evaluation of conclusions 

1 8. Organization of material (Victoria Curriculum and Assess- 
ment Board, 1990) 

Teachers rate each student"s project as hi^h, niccUuni, low, or not 
shown on each of these eighteen criteria. The concrete meaning of 
the levels comes from the way teachers observe students and rate 
them. 
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A second class of assessment involves focused applications 
that are based on particular pedagogic or developmental theories. 
These typically have definitions of levels that are not arbitrary; 
rather, they are based on a theory. Two such examples follow, one 
from a psychological research setting, and one from educational 
research. 

Siegler (1987) has described a study in which students were 
presented with a series of elementary addition problems and then 
were asked, "How did you figure out the answer to that problem?" 
Their answers were classified into one of five categories according 
to a scheme based on earlier research: 

1. Retrieval (R), where the student retrieves the answer from 
memory. 

2. Min strategy (M), where the student counts up from the larger 
addend the number of times indicated by the smaller addend. 

3. Decomposition (D), where the student transforms the original 
problem into two or more simpler problems. 

4. Counting-all strategy (C), where the student counts from one the 
number of times indicated by the sum. 

5. Guessing and "other" (Gl, where the student says that he or she 
guessed or did not know the answer. 

In his paper, Siegler makes the points that (al dependent 
variables such as solution time and error rate should not be 
"averaged over" these strategies as they were in the past, an 
approach that led to contradictory results in studies of addition, 
and (bl students do not use one strategy exclusively, but tend to 
show substantial variation. His analyses show clearly that some 
strategies are "better" than others in that they are associated with 
speed or a lower error rate. The literature provides evidence of a 
developmental sequence among the addition strategies. Ashcraft 
(19821 found that, although first graders are fairly consistent in 
their use of the Min strategy, fourth graders consistently use 
Retrieval, and third graders use a mixture of the two strategies. 
Siegler challenged the interpretation of Ashcraft's results, hut he 
also found that frequency of use of strategies changes according to 
grade level. Although many questions about the use of strategies 
remain, most are best answered by considering strategies one at 
a time. Perhaps strategies will be seen as part of a strategy-use 
continuum that can summarize not only the development of 
strategies, but also any regularities in their distribution within 
and between individuals. 
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Figure 6.2 shows the levels of strategy-use for addition prob- 
lems that are supported in empirical research (Wilson, 1992c). Note 
that the ordering of the strategies is not linear — different strategies 
may be equally "'good" at a developmental level. 

A second example uses phenomenography Phenomenographic 
analysis has its origins in the work of Marton ( 1 98 1 ), who describes 
it as "a research method for mapping the qualitatively different 
ways in which people experience, conceptualize, perceive, and 
understand various aspects of, and phenomena in, the world around 
them" (Marton, 1986, 31 ). Phenomenographic analysis usually 
involves the presentation of an open-ended task, question, or 
problem designed to elicit information about an individual's under- 
standing of a particular phenomenon. Often tasks are attempted in 
relatively unstructured interviews during which students are en- 
couraged to explain their approach to the task or their conception 
of the problem. Researchers have applied phenomenographic analy- 
sis to such learning areas as proportionality (Lybeck, 1981), number 
(Neuman, 1987), and speed, distance, and time (Ramsden, 1990). 

These studies found that students' responses reflect a limited 
number of qualitatively different ways of thinking about a phe- 
nomenon, concept, or principle (Marton, 1988). These outcome 
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Figure 6.2. Strategy-use kvels. 
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categories are "usually presented in terms of some hierarchy: There 
is a best conception, and sometimes the other conceptions can be 
ordered along an evaluative dimension'' (Marton, 1988, p. 195). For 
Ramsden (1990), it is the construction of hierarchically ordered, 
increasingly complex levels of understanding and the attempt to 
describe the logical relations among these categories that most 
clearly distinguishes phenomenography from other qualitative 
research methods. 

The six response categories in figure 6.3 describe a develop- 
mental understanding of atomic structure (Renstrom, Andersson, 
and Marton, 1990). "The six conceptions of matter should not be 
seen as a set of one correct and five erroneous conceptions. At each 
level some new insights are added that cumulate to the kind of 
understanding aimed at. All these various understandings are 
packed into 'the correct' understanding of matter. In a way, we can 
see our investigation as laying free or making visible the various 
tacit, taken-for-granted layers of the (scientific) understanding of 
matter" (p. 568). 

These authors go further in unpacking the developmental 
understanding of matter and explain, using figure 6.3, how the 
"scientific" understanding of atomic structure (Level F) builds 
through the hierarchy of levels. In Level B, and above, there is an 
understanding that substances exist in different forms and can 



] 


p The substance consists of systems of particles. Differem macropropertics of the 

substance can be accounted for in terms of properties of particles and particle systems. 


] 


The substance consists of particles that are not divisible into other particles and that 
^ have certain attributes (such a.s form and structure) that may explain macropropertics of 
the substance. 


] 


The substance consists of infinitely divisible particles, which might not consist of the 
YJ substance. 


Small particles ire introduced. They may be differem from the substance in which 
they are Imbedded (which creates the potential for thinking of atoms, which are 
components of the substance hut do tx)t have its macropropertics). 


] 


The substance is delimited from other substances and It exists in more than one form 
fj (which creates the potential for thinking of phase transition). 


Ihe substance is not delimited from other substances and it lacks substance attributes 



Figure 6.3. Levels in understanding ol atomic structure. 
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change from one state to another (e.g., solid to liquid). In Level C, 
and above, there is a recognition of the existence oi atoms (although 
in Level C, itself, these are thought of as particles embedded in the 
substance). In Level D, and above, there is a recognition that 
substances themselves consist of particles (although in Level D 
these are seen as infinitely divisible). In Level E, and above, 
substances are conceptualized as consistingof particles that are not 
infinitely divisible and that have attributes. And in Level F, there 
is a focus on systems of particles in terms of which the 
macroproperties of a substance can be understood. 



INFORMATION GATHERING 

New views ot student learning demand information gathering 
procedures that extend beyond the traditional standardized mul- 
tiple-choice tests. During the last decade, work on these procedures 
hashtcncMcd authentic, alternative,orperformanceassessment. 
The key features of such procedures have been described by 
Aschbacher (1991, p. 276) as follows: 

1 . Students perform, create, produce, or do something that requires 
higher level thinking or problem solving skills (not just one right 
answer). 

2. Assessment tasks arc meaningful, challenging, engaging, in- 
structional activities. 

3. Tasks are set in a real-world context or a close simulation. 

4. Process and conative behavior are often assessed in place of, or 
as well as, product. 

5. Criteria and standards for performance are public and known in 
advance. 

Many of these features are not new (Stiggins, 1 99 1 ). Forty years ago, 
Lindquist ( 19S 1 , p. 1 52; also quoted in Linn, Baker, ik Dunbar, 199 1 1 
wrote: ''It should always be the fundamental goal of the achieve- 
ment test constructor to make elements of his test series as nearly 
equivalent, or as much like, the elements of the criterion series as 
consequences of efficiency, comparability, economy, and expedi- 
ency will permit." It is probably fair to say, however, that in the 
interim years concerns with "efficiency, comparability, economy, 
and expediency" have predominated. Multiple-choice tests have 
been advocated widely because they possess these characters. It is 
time to pay more attention to tasks that are valued because of their 
close alignment to the criteria of greater instructional importance. 
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The alternative assessment movement reminds us that there are 
many information-gathering formats. To facilitate discussion of these 
formats, we use the ideas shown in the ''control chart" in figure 6.4. 
It does not describe all assessment types but, rather, assists us in 
describing several aspects of assessment that are relevant to this 
chapter. In the figure, the vertical dimension is used to indicate 
variation over the specification of assessment tasks. At the "high" end 
of this dimension we have assessment undertaken using externally set 
tasks that will only allow students to respond in a prescribed set of 
ways. Standardized multiple-choice tests are an example, whereas 
short-answer items are not quite at this extreme because students may 
respond in ways that are not predefined. The "low" end of this 
dimension is characterized by a complete lack of task or response 
specification. Teachers' holistic impressions of their students belong 
at this end of the task-control dimension. Lying between these two 
extremes are information gathering approaches, such as teacher- 
developed tests, and tasks that are adapted from central guidelines for 
local conditions. The horizontal dimension indicates control over 
judgment. The extremes are typified by machine scorability, at the 
"high" end, and unguided holistic judgments at the other end. Varia- 
tions between these relate to the status of the judge and the degree of 
prescription provided by judgment protocols. 
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We have illustrated some examples of assessment formats 
that can be arrayed along these dimensions in figure 6.4. At the top 
of the figure are standardized tests of various sorts. Multiple-choice 
(MC) tests are represented on the righthand side because they can 
be machine scored, but essay tests can be judged in a variety of 
ways, so they occupy a broader range on the left. The same 
horizontal classification can be used for teacher-developed tests, 
but they appear lower on the task-specification dimension. In the 
bottom lefthand corner are holistic teacher judgments that may, for 
example, be made from memory without reference to specific 
tasks. The extension of the shaded region to the right allows 
teacher knowledge of general guidelines to be incorporated into 
judgments. Another region on the figure is exemplified by the 
Australia National Profiles (Australia Education Council, 1992) 
wherein teachers are provided with carefully prepared rating proto- 
cols that they use with ad hoc examples of student work or on the 
basis of their accumulated experience with students. Curriculum- 
embedded tasks, such as the UK Standardized Assessment Tasks 
([SATs], Department of Education and Science, 1987b) and the 
Victoria Common Assessment Tasks ([CATs], Stephens, Money, 
Proud, 1991) are externally specified project prompts that are 
interpreted locally to suit student needs and then scored by teach- 
ers. Within the CATs, control over the scoring varies from un- 
guided teacher judgments to local teacher judgments within exter- 
nal guidelines. Typically, SATs involve a tighter control over 
judgment and have therefore been placed a little further to the right. 

Assessments that are placed in different locations on the 
figure are often valued for different reasons. In figure 6.5, we 
indicate that assessments in the upper righthand corner are valued 
typically because they are perceived to have greater reliability; that 
is, they are composed of standardized tasks that are the same for all 
students, that can be scored using objective criteria, and that are 
congruent with existing psychometric models. Alternatively, as- 
sessments in the bottom lefthand corner are perceived typically to 
have greater instructional validity. That is, they are closer to the 
actual format and content of instruction, are based on the accumu- 
lated experience of teachers with their students, and allow adapta- 
tion to local conditions. It is desirable to have the positive features 
of both of the.se forms of assessment, but, as the figure illustrates, 
no single assessment format encompasses them. 

The assessiiient net uses information obtained from a variety 
of locations on the figure; some information enhances validity and 
other information increases reliability. The new student assess- 
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Figure 6.5. Control chart for assessment formats: Perceived advantages 
at extremes. 



ment system being designed for Californian students (California 
Assessment Policy Committee, 1991) is an example of an assess- 
ment net. It is composed of three types of assessment activities. 

Structured on-demand assessments include most forms of 
traditional examinations. These may range from IS-minute quiz- 
zes (of multiple-choice or open-ended format), to extended activi- 
ties that can take up to three class periods, to a performance- 
assessment mode. Their distinguishing feature is that, although 
they derive from the framework in the same way as a student's 
regular instruction, they are organized in a more testlike fashion, 
with uniform tasks, with uniform administration conditions, and 
with no in-depth instructional activity occurring while they are 
taking place. The on-demand assessments could typical ly be scorablc 
in a manner that involves little judgment on the part of the scorer, 
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or could be scored by expert judges. This class of assessment 
information would reside at the top righthand corner in figure 6.4. 

Curriculum-embedded assessmen ts are to be seen as a part of 
their regular instruction by students. They would be chosen, 
however, from among the best of the alternative assessments, 
collected, tried out, and disseminated by teams of master teachers. 
They would typically be scored by the instructing teacher, ah 
though the results could go through certain types of adjustment for 
particular uses. This class of information would reside near the 
middle of figure 6.4. 

"Organic" portfolio assessments include all materials and 
modes of assessment that a teacher or student decides should be 
included in a student's record of accomplishments. They can 
include a varied range of assessment formats and instructional 
topics. Teacher judgment on the relationship between these records 
and the levels in the frameworks are the major form of assessment 
information derived from the portfolios. This information finds its 
place in the bottom lefthand corner of figure 6.4. 

Although each of the modes of assessment makes useful 
contributions to the overall assessment, what is needed is a way to 
integrate them and to ensure quality control. 



QUALITY CONTROL 

A procedure is needed to coordinate the information (in the form of 
scores, ratings, or other data) that comes from the several forms of 
assessment. Procedures are also necessary (1) to examine the 
coherence of information gathered, (2) to map student perfor- 
mance, and (v3) to describe the structural elements — items and 
raters — in terms of the strands or continua. Validity, reliability, 
bias, and equity studies must be carried out within the procedure. 
To meet these needs, we propose the use of generalized item- 
response models (sometimes called item-response theory). Gener- 
alized item-response models such as those described by Adams and 
Wilson (1992a), Kelderman (1989), Linacrc (1989), andThissen and 
Steinberg (1986) have now reached levels of development that 
make their application to many forms of alternative assessment 
feasible. The output from these models can be used for quality 
control and to obiain student and school locations on continua that 
may be interpreted both quantitatively and substantively. 

We take an approach, based on Rasch-type models, because 
we need: 
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1. A latent continuous variable as an appropriate metaphor for 
many important educational variables; 

2. The flexibility to use different ''items/' raters, etc., for different 
students if we are to reap the promised benefits of novel assess- 
ment modes; 

3. A measurement approach that is self-checking — in this case, it 
is termed fit assessment; 

4. A simple building block for coherent construction of complex 
structures; 

5. A model that can be estimated efficiently from such basic 
observations as counts. 

vThis last, (5), is of considerable importance because it corresponds 
: to traditional educational practice for scoring tests and other 
instruments. 

The next section may present a particular challenge because 
it supports some of its ideas using notation with which readers may 
be unfamiliar. Although we will not give full details on the 
statistical model, we will describe briefly some of its key elements 
to illustrate how it can meet the flexibility requirements that 
alternative assessments demand. Those educators who wish to 
gain more in-depth information on the model or apply the model in 
a school or district setting are referred to a detailed description of 
a unidimensional version of the model and a marginal maximum 
likelihood algorithm used to estimate its parameters in Adams and 
Wilson (1992a) and a multidimensional version in Wilson and 
Adams (1992a). 

Suppose a test is composed of several items (/), where each 
item has a number of response categories [K]. The observed re- 
sponse of any students to one of the items can be placed in one of 
the mutually exclusive categories represented by K. We have used 
the term item generically here. The items can, however, represent 
much more complex phenomena than the traditional multiple- 
choice test items we are used to (Wilson &. Adams, 1992a). The 
items could represent an entire set of questions that relate to a 
common piece of stimulus material, for example, an item bundle 
or a testlet, and the response categories could be the response sets 
for the bundle. Or the items could represent a set of tasks that have 
been scored by a group of raters, where the rater-task pairs are 
considered as the item. 

A statistical procedure allows scores to be assigned to a 
student's performance on each item and a vector developed that 
represents student abilities; it can be msed to locate or map their 
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performance on the strands or continua within the curriculum 
framework. Readers who are unfamiliar with matrix algebra may 
wish to pass through this section quickly and dwell on the example 
from Siegler that begins on page 000. 

The vector . . . , 1’ is used to denote the 

responses of person n to item i, with a 1 placed in the category in 
which he or she responded, and a 0 elsewhere. Note that a response 
in the first category (which we are using as a reference category) is 
denoted by a string of zeroes. By collecting the item vectors 
together as x^^ = (x'^^j, . . . , ), we can formally write the 

probability of observing the response pattern as 




A, B, ^ I 0 ) = 



exp x' (BB + A£l 
S exp z'(Bb + A£1 

.'ill 



( 6.11 



where A is a design matrix that describes how the elements of the 
assessments (e.g., raters and tasks) are combined to produce obser- 
vations, . . . , is a vector of the parameters that describe 

those elements, B is a score matrix that allows scores to be assigned 
to each performance, and u = (u,, u,, . . . , uj' is a vector of student 
abilities, or locations on the framework continua. The summation 
in the denominator of (6. 1 1 is over all possible response patterns and 
ensures that the probabilities sum to unity. The model is applied to 
particular circumstances by specification of the A and B matrices. 

For example, consider the simplest unidimensional item- 
response model, the simple logistic model (SLM), otherwise known 
as the Rasch model (Rasch, 1980). In the usual parameterization of 
the SLM for a set of / dichotomous items, there are I item-difficulty 
parameters. A correct response is given a score of 1 and an incorrect 
response is given a score of 0. Taking a test with just three items, 
the appropriate choices of A and B are 
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0 
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A = 
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and B = 


1 




.0 


0 


1 , 




1 



where the three rows of A and B correspond to the three correct 
responses, the three columns of A correspond to the three difficulty 
parameters, one for each item, and the single column of B corre- 
sponds to the student location on the continuum. 

If the A and B matrices given in (6i.2) are substituted into (6. 1 ), 
it can be verified that this is exactly the Rasch simple logistic model 
(see Adams (Ik Wilson, 1 992a). The estimated parameters that result 
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from the application of the model would be a collection of item 
locations and person locations on a continuum. 

More complicated item-response models may be expressed 
using equally straightforward matrices. For example, the partial 
credit model (Masters, 1982; Wilson, 1992b) is designed for assess- 
ment situations with multiple levels of achievement within each 
item. For an instrument with, say, three items and three categories 
in each, the categories scored 0, 1,2, then the A and B matrices are 
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and B = 
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The matrices have six rows, two for each item — recall that the 
responses scored 0 do not appear in the matrix because they arc used 



17. Evaluation of conclusions 

4. Interpreting and critically evaluating results 

1 1 . Understanding, interpretation, and evaluation of mathematics used 

15. Defining mathematical symbols used 

5. Analysing information 

6. Breadth or depth of investigation 

10. Use of mathematical language, symbols and conventions 

7. Mathematical formulation or interpretation of problem situation or 
issue 

16. Account of investigation and conclusions 

9. Level of mathematics used 
S. Working logically 

14. Relating topic to theme 

12. Accurate use of mathematics 

1 . Identifying important information 

18. Organisation of material 

15. Clarity of aims of project 

8. Relevance of mathematics use 

2. Collecting appropriate information 

Figure 6.6. Victoria projects criteria ordered by average rating. 
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as reference categories. The A matrix is a block diagonal matrix 
indicating that each item is modeled by a unique set of parameters. 
The B matrix contains the scores allocated to each of the responses, 
and it has one column, corresponding to a single ability dimension. 
For example, the ordering of the Victoria project criteria using this 
model is given in figure 6.6. Such an ordering maybe used to add 
depth to the interpretation of the variable being a.ssessed by the 
projects. A similar ordering may be used to construct a continuum 
for the phenomenography example (see Masters, 1992) and could 
also be found for the Vermont portfolios, given suitable data. 

A slightly more complicated example is provided by the 
ordered partition model (Wilson, in press), which, in addition to the 
features of the partial credit model, also allows for categories 
within an item to have the same score. For an instrument with, say, 
two items and five categories in each, the categories were scored 0, 
1, 2, 2, and 3, respectively, then the A and B matrices arc 
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The matrices have eight rows, four for each item. The A 
matrix is an identity matrix indicating that each response to each 
item is modeled by a unique response parameter. The B matrix 
contains the scores allocated to each of the responses, and it has one 
column, corresponding to a single ability dimension. 

As an example of a continuum constructed using the ordered 
partition model, consider the example from Sieglcr (1987) de- 
scribed earlier. In his data, there were sixty-eight students with 
complete data records, from grades kindergarten, 1, and 2. The 
problems ranged from tho.se as easy as "4 -h 1 = ?" to the more 
difficult "17 -H 6 = For illusirative purpose.s, 1 have chosen a 
.subset of the original item set: (A) 12 -h 2, (B) 14 + 1, (C) 3 -h 14, (D) 
1 + 14, (E) 17 -H 4, (FI 16 + 6. These are in tnree pairs: The first pair 
is taken from .Siegler's problem type 1, where the larger addend is 
first and the smaller addend is relatively small (i.e., in the range 
1-3); the second pair is taken from his problem type 2, which is the 
same except that the larger addend is second; and the third pair is 
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taken from his problem type 4, where the larger addend is first and 
the smaller addend is relatively larger (in the range 4--6), which 
means that the sum is also relatively larger. Because there are six 
items, the appropriate A and B matrices will have three times as 
many rows as those shown in (6.4), but will otherwise have the 
same structure. 

Figure 6.7 is a map of the continuum that has been con- 
structed from the calibrated item difficulties for ratings of student 
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Figure 67. A ccmtinuum in addition strategy' use. 
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strategy-use on the six addition items. The map has a vertical 
scale — the numerical expression of the continuum — that repre- 
sents increasing difficulty, and in the middle panel the difficulty 
thresholds for items are plotted. In this panel, we use the notation 
X.n to indicate the difficulty of achieving level n in item X. The left 
side of the figure indicates the distribution of student scores over 
the continuum. The map relies on the fact that the measurement 
model produces person-ability estimates and item-difficulty esti- 
mates that are expressed on a common scale. A certain amount of 
detail is lost in using this map (in fact, information is provided only 
about the points at which successive cumulative probabilities 
reach .5), but this is always the case with a summary. The thresh- 
olds can be interpreted as the crest of a wave of predominance of 
successive dichotomous segments of the set of levels. For example, 
X. 1 is the estimated point at which levels 1, 2, and 3 become more 
likely than level 0 (for item X); X.2 is the estimated point at which 
levels 2 and 3 become more likely than levels 0 and 1 ; and X.3 is the 
estimated point at which level 4 becomes more likely than levels 
0, 1, and 2. 

In the righthand panel of the map are descriptions of increas- 
ing competence with respect to strategy-use — this is the substan- 
tive expression of the continuum. These descriptions allow a 
substantive interpretation of the numerical location that is esti- 
mated for each student by the measurement model. For example, 
a student at the position denoted by 2.0 on the numerical con- 
tinuum would typically be expected to have a pattern of strategy- 
use like that described by the adjacent description on the substan- 
tive expression of the continuum; that is, he or she would be 
expected to use retrieval most of the time for half the items {A, B, 
and D), but less for the other three items. The student would not yet 
be expected to display the patterns indicated by the descriptions 
above 2.0, and would be expected to have previously displayed the 
understandings indicated by the descriptions below 2.0. 

If we collected additional data from these students at a 
subsequent testing, we would obtain a second location for each 
student on the continuum. Hence, we can measure progress using 
the locations on the numerical continuum, and we can interpret it 
using the levels on the substantive continuum. 

As a second illustration of the flexibility of the measurement 
model, consider a more complicated example that may be more 
typical of alternative assessment. Students are given two problem- 
solving tasks, and two judges place the students' performances into 
one of the four categories. Category one represents no strategy and 
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is assigned a score of 0, categories two and three represent alterna- 
tive but less sophisticated strategies and are both scored 1, whereas 
the fourth category represents a superior strategy and is scored 2. 

A model that allows for an estimation of the difficulty of the 
tasks, as well as the relative harshness of the raters and places the 
students on a single continuum, is given by the following A and B 
matrices: 
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A and B have twelve rows, corresponding to the three possible 
nonzero scores for each of the four ^'items'' (rater-task combina- 
tions). The first six rows arc for the tasks rated by rater 1 while the 
last six rows are for the tasks rated by rater 2. The first six columns 
of A correspond to task parameters analogous to those in (6.3), and 
the last parameter is a rater-harshness parameter — in this instance, 
estimating how the harshness of rater 2 compares to that of rater 1 . 
The rows of the B matrix are simply the item scores, and because 
wc arc again assuming a single continuum, B has only one column. 

In the proceeding situation, we have modeled only variation 
in rater harshness. This is a simplistic view of how raters may vary. 
Raters may also vary in the way they use the response categories; 
some may have a tendency to give more extreme responses, 
whereas others may prefer the middle categories. This, and many 
other possibilities, could he modeled through different choices of 
A. The most general approach would involve the estimation of a 
separate set of item parameters for each rater. 

In the case of multiple raters, maps like that illustrated in 
figure 6.7 could be constructed for each rater, or they could he 
constructed for the average rater. In a quality control context, the 
ideal would he to ruse this approach to help raters align their 
judgments. When this alignment process has resulted in a 
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sufficiently common map for all raters, we would need only a single 
map. In the case of large numbers of raters, the model can be respecificd 
under the assumption that the raters have been sampled from a 
population, and the model would estimate characteristics of the rater 
population, in particular, the degree of variation between raters. 

In using this model as part of an assessment net, one would 
need to apply the procedure to mixed-item formats. The technique 
described generalizes quite readily to such situations and allows 
the specification of different weights for different formats. For 
example, a teacher's end-of-year rating would occur in the model as 
one item, and this would need to be weighted according to beliefs 
about the relative importance of the summary end-of-year rating 
compared to ratings on specific tasks. 

Quality control information is also available in the assessment 
net. Standard techniques for assessing reliability, validity, fairness, 
and equity are available because of the measurement model's status 
as an item-response model (Hambleton, Swaminathan, Cook, Eignor, 
Gifford, 1978; Lord, 1980; Wright ik Masters, 1982). 



CONCLUSION 

The methods just suggested are based on existing technologies in 
assessment and measurement. Some, such as frameworks and alter- 
native assessment, can hardly be said to be new. Others, such as the 
complex measurement models, arc quite new (although a computer 
program is now available to implement them (Adams & Wilson, 
1992b]l. They will, nevertheless, need adaptation to the tasks and 
conditions of specific circumstances. This will require considerable 
research and development before large-scale applica tions can be made. 

The examples described here have been unidimensional. In 
some circumstances, such as the incorporation of different types of 
raters — for example, fixed and random — into the assessment net, it 
makes sense to use a multidimensional approach. Such situations 
and appropriate modifications to the model are described in Wilson 
(I992b|. 
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7 ❖ Connecting Visions of Authentic 
Assessment to the Realities of 
Educational Practice 

A4. Elizabeth Graue 



What a difference a few years make ! Discussions about assessment 
in the 1980s focused primarily on the technical aspects of develop- 
ing measures of achievement and the use of tests to leverage 
educational reform (see, for example, Airasian, 1988; Burnes 
Lindner, 198sS; Cohen, 1987; Mehrens dv Kaminski, 19891. Conver- 
sations about testing (and testing, rather than the broader term 
assessment, was in fact the topic) centered on the concerns of 
policy makers and measurement specialists trying to refine tests as 
tools for reporting achievement for accountability. Testing instru- 
ments were developed and used for primarily administrative pur- 
poses; the spotlight was on assessment that informed various 
publics about the efficacy of education in the United States. 

Noticeably absent from discussions during this time were 
teachers and subject matter specialists. They did not possess the 
knowledge (defined in terms of technical principles of psychomet- 
rics or policy frameworks) or the power to enter the conversation; 
they were as often the target of activity as a party to development 
or implementation planning of new assessment practices. This gap 
was exacerbated in a system that imposed testing instruments on 
classroom settings for purposes other than instruction but that had 
instructional implications (Smith, 19911. Those tests, developed 
using psychometric theory and technique that only measurement 
experts could understand, were increasingly mysterious to teach- 
ers. Assessment theory and technique have developed very sepa- 
rately from curriculum theory and practice. Theoretically part of 
the very same educational process, testing and assessment were, in 
practice, the property n( measurement experts. Instruction and 
curriculum were the prcu'ince of teachers and content area that ot 
specialists. 

As dissatisfaction grew with the social effects of assessment 
practices and the information that test scores could provide, new 
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ways of thinking about learning forced us to reevaluate the way we 
monitored the educational process. The dual forces of curriculum 
reform and the negative effects of test-driven '"improvement" have 
widened the circle of participants in assessment discussions be- 
yond the psychometricians and politicians to include content area 
specialists and, potentially, practitioners. In addition, the focus of 
these discussions has broadened beyond summati ve evaluations of 
achievement to include more formative assessments for instruc- 
tional decision making. A ground swell of support for new ways of 
finding out about what students know and how they know it has 
shifted the discourse on assessment, broadening the issues seen as 
salient, the tasks judged to be appropriate, and the technology used 
in analysis. 

The work presented in this book represents a new phase in the 
assessment discussion, an attempt to begin a conversation among 
those who have a stake in monitoring educational activities. A real 
attempt has been made here to broaden the vision of assessment 
from unidimensional instruments constructed to heighten the 
statistical properties of items. Here we have a vision of possibility 
that makes assessment the bridge for instructional activity, ac- 
countability, and teacher development. 

This vision, which values assessment of process as well as of 
outcomes,, is becoming an accepted value by much of the educa- 
tional community. One of the key foundations of the new assess- 
ment movement is a strongly held commitment to the congruence 
of curricular beliefs and assessment activities. The idea of instruc- 
tional alignment (Cohen, 1987), which was prominent in the 
1980s, suggested that instruction should be directed by standards 
represented by test content. Increasingly, that idea has been turned 
around to suggest that assessment practice needs to match instruc- 
tional philosophy and activity. The actions of the mathematics 
education community are far ahead of the rest of the educational 
field because much of the necessary groundwork has already been 
done. As a set of guiding principles for practice, the NCTM 
Standards (NCTM, 1989, 1991 ) are unique in their specificity and 
coherence. By beginning with issues of content and pedagogy and 
working toward assessment, the cart is finally placed behind the 
horse — we focus on building assessments that serve teaching and 
learning purposes and are congruent with our understanding of 
content structure and process. 

In addition, this movement should highlight the fact that the 
formats we choose for instruction or assessment are just that; they 
are choices that \vc make as professionals. Rather than existing 
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separately from practice as obiective paths to truth, our assessment 
activities, from flll-in-the-bubble tests to interviews, are socially 
negotiated facts that are developed to provide information about 
constructs that ''carry with them in the process of interpretation a 
variety of value connotations stemming from three main sources: 
the evaluative overtones of the construct mbrics themselves, the 
value connotations of the broader theories or nomological networks 
in which the constmcts are embedded, and the valuative implica- 
tions of the still broader ideologies about the nature of humanity and 
society that frame the construct theories" (Messick, 1981, p. 12). 
This idea is echoed in Cherryholmes's ( 1 988 ) discussion of construct 
validity: "Construct validity and research discourses are shaped, as 
are other discourses, by beliefs and commitments, explicit ideolo- 
gies, tacit world views, linguistic and cultural systems, politics and 
economics, and how power is arranged" (p. 428). 

These choices represent a certain set of values and ideas about 
the way that content, students, and teachers interact in classroom 
settings. One of the choices made in this book, by all of the authors, 
represents some variant of social constructivist theory as a founda- 
tion for their discussions of assessment. In using social 
constructivism as a foundation for our practice, we shift to new 
ideas that rearrange those interactions in profound ways. In this 
context, we must be careful not to rely on forms that personify old 
models of content, learning, and teaching, even if they have track 
records and a foundation of evidence that appear to make their use 
defensible. Even that evidence has a history and philosophy that 
links it to very specific conceptions of the educational process. 
These choices bring with them new challenges and responsibili- 
ties. The work in this book represents some of our best efforts to 
struggle with the ramifications of the move to constructivist 
approaches to education and its embodiment in assessment. 

Thus, there is significant coherence in the goals set out by the 
book's authors. Three general themes are prominent in their 
investigations of authentic assessment in the context of reform: 
The critique of assessment that is ongoing, the frameworks being 
developed to guide assessment, and the nature of assessment 
activity. I review these issues in the next section. 



CKinc^UE 

The first theme, critique, lays out the problems inherent in our 
current assessment practices. The process of critique has been the 
most ptominent one; the resulting message is that what wc have 
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been doing does not match our aims and our current knowledge 
base. All of the authors point to the implications of changing our 
instructional philosophy — they forcefully address the need to re- 
think assessment practices. This shift implies a new relationship 
between assessment and instruction. By its very nature, 
constructivist te^.ching requires a merging of the two activities. 

This merging of assessment and instructional philosophy and 
practice has pragm )t ic implications for the form activities take and for 
the meaning that \^e place on results. De Lange points out that in 
measuring the process and outcomes of new forms of mathematics 
curriculum, we must structure our assessments using constructivist 
(or, as he suggests, realistic mathematics) theories. Using other types 
of evidence to measure effectiveness would be comparable to deciding 
whether someone should make the basketball team using criteria 
developed for the swim team. Both are sports, but the demands, 
processes, and issues are rather different. The development of new 
assessment theory, which is vital as we explore authentic assessment 
strategies, has been called for by Romberg and Wilson in these pages, 
as well as by others in the measurement community (Fredericksen ^ 
Collins, 1989; Linn, Baker, Dunbar, 1991; Moss, 1992). 

The most pointed discussion of this issue is put forward by Stake 
in his argument about the utility of standardized tests to inform our 
understanding of mathematics learning. His chapter reminds us of the 
strange bind we find ourselves in when we rely on standardized tests, 
as they as currently constructed, to portray the nature of educational 
activity in mathematics classrooms. The simplistic nature of testing 
instalments misrepresents the complexity of mathematics as a knowl- 
edge system and has led us to overestimate the utility of tests to inform 
educational practice. This points to the key idea that assessment 
instalments as tools arc very good for some purposes and not so good 
for others. Keeping in mind appropriate uses for the tools we use is the 
only way we will get out of the testing morass that has characterized 
recent educational practice. 



FRAMEWORKS TO GUIDE ASSESSMENT 

Coming out of the current atmosphere of critique are frameworks 
for assessment congruent with a reformed view of mathematics. 
The recognition that it is necessary to map out the theory, values, 
and practice of assessment in a way that goes beyond psychometric 
considerations is a new and welcome approach that highlights its 
situated nature. Romberg and Wilson, as well as Laioic, set out to 
build a vision of authentic assessment: The authors are clear that 
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new assessment practices in mathematics should be shaped jointly 
by the NCTM Standards and evolving knowledge about socially 
developed cognition. Sharing many basic values for the develop- 
ment of authentic assessment, the ideas presented in these chap- 
ters are not redundant but, instead, highlight one of the key 
concepts that this book contributes to the literature — that educa- 
tional practice, from systems planning to the development of 
individual instructional and assessment tasks, should be infused 
with common theoretical threads. Only then can we hope to avoid 
the conflicts that have characterized educational change in the 
recent past. 

Romberg and Wilson speak to the broad vision of assessment 
that comes out of commitment to the Standards, moving beyond 
statements regarding curriculum content and instructional activ- 
ity and extending the ideas from this consensual agreement for 
practice into plans for an assessment system. Their discussion is 
characterized by sweeping strokes that call for integration across 
the diversity of activities in education. Lajoie's contribution can be 
seen in her explication of two components of the new theories of 
learning: (1) work in the areas of situated cognition and the more 
inclusive idea of social constructivism, and (2) her application of 
these ideas to build a framework for authentic assessment tasks. 



ASSESSMENT ACTIVITY 

The third theme is one of activity: In what kinds of tasks should we 
engage students to find out what they know? This is the primary 
focus of Silver and Kenney's, de Lange's, and Wilson's chapters, 
each of which approaches the problem from very different vantage 
points. Working from a position most similar to current practice, 
Wilson focuses on the development of psychometric models of 
assessment that are amenable to formats more open than multiple 
choice. Wilson's theoretical model includes dimensions of perfor- 
mance and knowledge that are salient in a more complex view of 
learning. The flexibility built into this proposal is a great advance 
from the limited traditional models of measurement that collapse 
performance on items or a collection of items into a single category; 
including issues such as strategy choice, as well as type of answer 
given, makes a much richer picture. 

A symbolic question needs to be asked in a context in wdhch 
highly sophisticated theories are used in analysis: How does the 
value of these new ways of unraveling the meaning of performance 



erIc 



O 

< .1 



(‘()\\u:ii\ri VISIONS of assiSsmlnt io pkacik t ❖ 265 



on authentic tasks alienate those without technical expertise even 
further than we have alienated them in the past ? For those who are 
not highly conversant with the mechanics of item response theory, 
this approach might appear counterintuitive when compared to the 
rich, local knowledge provided by authentic assessment. Wilson's 
model has great promise, particularly in its ability to blend a variety 
of types of information generated in the course of instruction. I 
bring up these questions to those in the measurement community 
to remind them that part of their job is to justify and explain 
theoretical models to all participants because the practice of 
assessment has been broadened beyond the technically proficient. 
If assessment is seen as communication, having a clear way to 
communicate its process and products is vital. Only then will 
people have the confidence to use ihe tools that we develop. 

The chapter by de Lange contains an impressive assortment 
of assessment tasks that enhances the possibilities for the types of 
activities we can use to probe student understanding. Even reading 
through these tasks forces a rethinking of mathematics activity; it 
represents an active approach to the field that does not allow for 
simple questions or simple answers. These assessment tasks pro- 
vide an example of activities that are simultaneously evaluative 
and instructional; it would be hard to imagine a student whose 
imagination could not be captured by this fresh new way to 
approach mathematical ideas. The author's discussion of issues in 
the development of assessment tasks provides us with new ways to 
think about how we construct activities. In posing the importance 
of levels in assessment, the role of context, necessary and sufficient 
information, and a variety of test formats, U.S. educators have the 
opportunity to think beyond simple open items or multiple-choice 
formats. 

Importing these ideas into U.S. classrooms will require, 
however, further elaboration so that vaguely familiar ideas are not 
transformed in tlie old measurement model context. It would be 
very easy to collapse the idea of levels of assessment into either a 
simplified version of bloom's taxonomy or in terms of item diffi- 
culty — and that is not the intent of the author. Incorporating novel 
ideas into existing schemas can he tricky, and in this case it appears 
doubly so. 

Silver and Kenney orient their chapter in terms ol assessment 
and instructional practice. They clearly differentiate between as- 
sessment for decision making outside the classroom and assess- 
ment that serves day-to-day teaching decisions. This is an impor- 
tant distinction, particularly at this point in social history. CXir 
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attention to tests and critiques of assessment practices has focused 
primarily on the role of external assessment in education. In 
reorienting us to the differences between external and internal 
forms of assessment and heightening the salience of internal 
assessment for valid instruction, we can begin to bridge the gap 
between instructional and assessment practice, weaving the two 
together so that they more adequately inform one another. Giving 
authority to nonquantitative strategies for tracking student growth 
provides the teacher with a wider range of alternatives for gathering 
information and underscores the importance of ongoing monitor- 
ing of classroom activity. 

It is exciting to see extended discussions of nontraditional 
assessment tasks like those presented by de Lange and Silver and 
Kenney. They provide a glimpse of the possibilities available to 
teachers in their instructional practice. Such strategies, from inno- 
vative tasks to interviews, promise a wealth of rich, contextualized 
information. But as presented, they make the reader hungry for 
more — for a discussion of strategies beyond data collection. How 
do teachers manage this information, make sense of it, and commu- 
nicate it? Wilson's model is available for large-scale use but would 
not be appropriate for the day-to-day analysis activities that teach- 
ers face. These new tasks and strategies will require intense 
analytical development, with attention to the fact that the philoso- 
phy of analysis must match the philosophy of instruction and data 
collection. Otherwise, it is quite likely that we will fall back on 
easy-to-develop single scores to contlate the elaborate information 
available in authentic assessment, all in the name of entering the 
results in the grade book or to justify judgments made to parents 
and administrators. Just as we have been advised that multiple 
strategies for assessment will be needed, it would be fair to guess 
that multiple strategies for analysis will be required as well. 



CONVERSATIONS STILL TO RE HAD 

So what is missing in this hook? Have we pushed the ideas of 
construetivist assCvSsment as far as they can go? Several chapters 
remain unwritten, both in this text and in the discussions that have 
occurred about assessment in the rest of the educational commu- 
nity. As we have moved our di.scussions to new forms of asscvss- 
ment, we have focused on ciitiques of old models and suggestions 
of new strategies and tasks. What we have not explored too 
carefully are next steps that require us to give up many of the 
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traditional views of testing: exploration of ways to understand and 
make use of the information gathered through authentic tasks, 
discussion of an assessment system that has very specific tasks and 
purposes, and examination of the social utility and impact these 
new forms will have on schooling and those who participate in it. 
These missing discussions suggest that we may have fallen prey to 
the very problem tha* de Lange warned us of — we have not stepped 
away from the old perspectives of assessment. 

The absence of such discussions could be seen as a remnant 
of thinking from the perspective of the old models of measure- 
ment. Some of those working in the field are still clinging to 
notions that were developed in what Berlak ( 1992) calls a psycho- 
metric paradigm rather than moving completely into a contex- 
tual paradigm (p. 12). In old models, the standardized test as a tool 
included within it the prescription for action; the results of a test 
provided an assumed blueprint for what happened next. Low 
scores, premised on a ranking notion of achievement, indicated 
the need for remediation, usually through reteaching or place- 
ment into another instructional group. This is shown most 
clearly in Shepard's (1991) work, which indicated that testing 
directors in U.S. schools tended to work from a factory-oriented, 
behaviorist model of learning in their understanding of testing 
and achievement. 



LIFE AFTER TEST ADMINISTRATION 

With new forms of assessment, the focus of the process does not end 
with the development of sophisticated items that tap multiple 
levels of understanding or with the development of technology to 
model traits that arc seen to underlie performance on an instru- 
ment. In a very basic sense, the implementation of an assessment 
task is just the beginning — there is much to do after an instrument 
is given to a group of students. We have not pushed that idea as wc 
have continued to rely on old notions of posttest activity in the 
form of scoring item performance. Although the discussion has 
expanded to include the use of both trained teachers and "experts" 
and involves a wider range of considerations than simply whether 
an answer is right or wrong, we are still not extending our discus- 
sions toward more interactive ways of understanding the nature ot 
student activity. The focus of the conversation still veers toward 
standardized, end-point decisions regarding what a performance 
represents about some underlying trait. 
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If we are thinking about learning in the way that Romberg 
and Wilson suggest in their chapter, as an ''image that is gradually 
brought into sharper focus as the learner makes connections, or 
perhaps like a mosaic, with specific bits of knowledge situated 
within some larger design that is continually being reorganized in 
an organic manner'' (p. 5), then assessment should be seen as the 
lens used by both teachers and students to view the learner's 
increasing competence. Creating a view of assessment that is as 
dynamic as the process of learning requires us to broaden our 
vision of ( 1 1 what constitutes an assessment activity, (2) how that 
activity might be understood, and (3) how it might he used to 
improve instruction. The first step is what has been most clearly 
discussed in this book: A variety of tasks were proposed, with new 
ideas for scoring performance on items. Although an important 
activity in the total assessment enterprise, the suggestion of tasks 
and of criterion-related scoring rubrics does not take us to the end 
of the line. 

I would argue that our next push in the field must be in a move 
away from singular attention to scoring items to interpreting 
cictivity. From this perspective, what we do is more like reading the 
meaning of what a student does rather than scoring what that 
student knows. It is a much more interpretive process, one that 
requires intimate knowledge of context, social relations, and the 
meaning of any particular act. Our visions of assessment 
mirror our visions of learning — I would apply Romberg and Wilson's 
description of mathematics to the activity of assessment by replac- 
ing the word mcithenicitic^ with the word cis.<essment: Assessment 
is a set of rich, interconnected ideas. Assessment should be viewed 
as a dynamic, continually expanding field of human creation, a 
cultural product. Assessment is learning, directed by teachers and 
providing many w'ays of knowing for teachers, students, parents, 
and other publics. 

This idea is hinted at by Romberg and Wilson in their 
discussion of the opportunity that authentic assessment provides 
for professional development; they give several examples of teach- 
ers making the professional judgments necessary to score open- 
ended forms of performance tasks, both in the United States and 
abroad. However, the argument is still framed in terms ot scoring 
and emphasizes the need to "train" teachers so that they can seethe 
underlying structure of student performance. Are there ways that 
we can open up the discussion t 

One way to do this is to heed Moss's advice (19921 to explore 
alternative epistemological orientations for developing assess- 
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mcnt and interpretation strategies. This approach has already been 
proposed by a number of scholars studying assessment alternatives 
(Berlak, 1992; Johnston, 1989) and has been attempted by a growing 
number of practitioners. A good example is the interpretive model 
for communicating about portfolios suggested by Moss et al. 1 1 992). 
The framework they have developed encourages teachers and 
students to work together to make creative and intellectual choices 
reflecting local goals and interests. The teacher's interpretations of 
student work arc the main component of the evaluation and are 
built through teacher narratives of student work, paired with 
samples of the work itself. This allows for grounded interpretations 
of student growth that can be reanalyzed by anyone who reads the 
case. It is distinctly different from an empiricist view of measure- 
ment; it is much more locally generated and tiualitative. It relics on 
the professional knowledge of teachers and the relationships they 
have wuth their students. 



THE VARIED NATURE OF ASSESSMENT 

Taking the approach just described honors the complexity inherent 
in any single performance or a group of performances: It 
contextualizes performance in the social setting in which it occurs. 
But to do this we need a clear picture of what the assessment 
enterprise is all about. One of the difficulties inherent in contem- 
porary discussions of assessment is that we use the term assess- 
ment to refer to very different kinds of activities. On the one hand, 
it is used by Wilson to describe standardized, externally developed 
tasks and teacher judgments that are used to gain an understanding 
of a latent variable like student position along curricular cominua. 
On the other hand, it is used by Silver and Kenney to describe 
strategies that teachers can use in the course of instructitnial 
practice to gather information about student learning. Somewhere 
in between fall de Lange's multilevel tasks. 

As a whole the chapters in this hook provide a glimpse of the 
elements of the assessment process in its totality, hut the system, 
or the package, is not yet explicitly described. We have learned trom 
recent history that when different forms of assessment encroach on 
one another, problems ensue. Expli(‘itly mapping the questions to 
be answered, the decisions to be made, and the tools we develop to 
do the )oh is a vital first step as we move to integrated asses anent 
programs. This is especially important in a transitional era such as 
the one we are going through today. 
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As forms of assessment are adapted to the changing needs of the 
educational system, it is vital to have a clear view of an assessment 
system, with all of its components, so that assessment activities can 
be balanced across needs. This requires explicit statements of the 
requirements of assessment paired with prescriptions for the imple- 
mentation of appropriate strategies for generating information. Al- 
though all assessment is pointed to educational improvement, an 
assessment system is composed of elements developed to answer 
very different kinds of questions posed by very different publics. 
Questions of accountability to external audiences require the devel- 
opment of measures that examine broad visions of content and 
student perfonnance within that content. The question of whether 
wc are looking at national snapshots of skills, or state or district 
estimates of learning, will shape the nature of the tasks developed 
and the kinds of precision required in analysis. The practice of 
assessment in a local classroom requires very different levels of 
information, involving short-term judgment of students' developing 
expertise on highly specific educational content. We do not cur- 
rently have the technical expertise to bridge these various assess- 
ment purposes (although Wilson's model is definitely a step in the 
right direction), and the political landscape of the schools makes 
attempting it questionable. Until we have confidence that the 
various assessment purposes are parallel, it is important to specify 
what level of assessment we are addressing in our conversations. It 
also requires that all forms of assessment be valued for their own 
specific purposes; that the authority for valid information be spread 
from the external and standardized instruments to include the 
appropriate use of strategies such as observation and interview. 



ARE THESE NEW FORMS OF ASSESSMENT USEFUL^ 

Linked to the issue of definition of the assessment enterprise is the 
need to develop ways to monitor the effectiveness and impact that 
new forms of assessment have on those inv(ilved in the educational 
system. Traditionally, the criteria against which assessment in- 
struments and practices were measured included psychometric 
concepts like predictive validity, or bias, and economic factors like 
per-pupil cost. As we move to new forms of assessment, the criteria 
that we consider need to be congruent with the philosophy of the 
model employed, broadening to include assessment of impact on 
the school environment as a whole and to the social and institu- 
tional costs that go with new practices (Moss, 19921. 
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As v/e move through this era of reform in instructional and 
assessment practice, for example, we need to monitor the practical 
utility of the proposed assessment strategies, instruments, and 
analysis methods to ascertain whether they are doing the jobs wc 
have set out for them. In particular, it is important to examine how 
new forms of standardized assessment provide information and their 
impact on classroom practice. The rationale for the development of 
formats like performance assessment is founded in part on the hope 
that it will provide more effective information to shape the practice 
of instruction (Wolf, Bixby, Glenn, &. Gardner, 1991). Whether they 
can in fact accomplish this purpose is dependent on the subtle 
interaction of such factors as ( 1 ) our ability to develop the technical 
expertise to interpret performance on authentic tasks, (2) the practi- 
cal utility of the strategy, and (3) the consequences connected with 
performance. We need to be mindful that nothing is inherently more 
useful, mithful, or real about new assessment tasks or strategies — 
their promise is directly related to the context in which they are 
used. The formats chosen represent a single piece of a larger picture. 
Rather the practice of cv^sessment— the relationships formed, the 
interpretations made, the understandings generated, and the actions 
taken — defines the appropriateness of our assessment choices. The 
standards for validity developed in the psychometric paradigm have 
been expanded to include consideration of the consequences of 
assessment activity (Messick, 1989; Moss, 1992) and that standard 
still applies as we work to develop authentic assessments. Again, the 
utility of the tool is related to the task to which it is applied. 

Calls for new forms of curriculum and assessment have come 
in part out of a concern about bias in the traditional approaches to 
teaching and testing. The distance between the life experiences of 
diverse learners and the activities in which we engage them is in 
theory bridged when we use real-life problems and assessment 
tasks in the classroom (Brown, Collins, &.Duguid, 1989). One of the 
principles suggested by Lajoie for operationalizing authentic as- 
sessment in mathematics called for considerations of "racial or 
ethnic and cultural biases, gender issues, and aptitude biases" (p. 
31). Measures of bias have typically focused on differential perfor- 
mance on tasks that sample outcomes of learning. As we move to 
more authentic forms of assessment, our view of bias must adapt 
to the philosophy and structure of the tasks that we propose and 
must be sensitive to issues of power in the implementation ot 
assessment in the schools. 

These assessment alternatives have implications for the rela- 
tions among teachers, students, parents, administrators, and pub- 
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lie audiences i nterested in education. They open up possibilities for 
participation, responsibility, and reflection; or they have the poten- 
tial to increase alienation, capricious judgment, and lack of under- 
standing. The authenticity of assessment relies on participant 
engagement in tasks and productive access to the information 
generated. Nothing is authentic about a task that systematically 
excludes people due to its structure. Ideas about bias should be 
broadened to include access to the process of assessment (i.e., How 
is student engagement facilitated by this strategy"! as well as 
productive understanding of the products it provides (i.e.. How do 
parents understand the information presented in forms like portfo- 
lios, learning progress maps, or profiles?!. We need to pay attention 
to the manner in which the tools of assessment invite or inhibit 
participation in their use, because this participation is the key to 
their utility. From this perspective, bias is a matter of equality of 
opportunity rather than equality of outcomes. 

Connected to the need to iminitor the use of new forms of 
assessment is the necessity of being mindful of the conditions 
under which teachers are asked to implement what amounts to 
radically ditferent approaches to instruction and assessment. Es- 
sentially, teachers are being called upon to take responsibility and 
authority for assessment activities that were seen as technically 
beyond their expertise and that were, worse, tasks that the public 
did not trust them to do. 

Traditional testing practices were developed from the factory 
model of education (Shepard, 1991!, and the structure of most 
schools has retained that model in the form of age-graded group- 
ings, grade-specific curricula, and standardized testing at almost 
every level. The new torms of assessment that have been proposed 
do not fit neatly into a compartmentalized or atomized system. If 
these new responsibilities are going to pay off hn both teachers and 
students, then the school as an institution must adapt to the 
necessary change in order to facilitate the transformation. At the 
very least, any suggestion of reformed practice, whether it .s 
instructional or assessment-oriented, needs to make parallel rec- 
ommendations o{ structural change that make the reform possible. 
The NCT M Suindarch (1991 ! outline the need for teacher develop- 
ment in the area ot curriculum, but, historically, assessment has 
been seen as very separate from instructional concerns. Teacher 
development in the area oi assessment should be parallel or even 
embedded in teacher development in the area ot instruction. 

Cuven the labor-intensive nature of locally developed and 
relevant assessment practices, institutional changes will need to 
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occur in teacher wc:rkload, staffing patterns, and economic invest- 
ments in schools. For example, simple paper-and-pencil tests are 
attractive to many teachers because they are efficient ways to find 
out certain things about students. They do give us some informa- 
tion about outcomes, but other strategies tell us even more. 
Unfortunately, these alternatives to traditional tests take an im- 
mense amount of time and discipline to implement, both in terms 
of collecting data and in managing and communicating it (Gomez, 
Graue, ik Bloch, 199 1 1. Unless the structural constraints on assess- 
ment practice are taken seriously, the practices themselves may 
add to conditions that create inequality for teachers, students, and 
families (Apple, 19921. The burden of making this work has been 
placed quietly on teachers whose list of responsibilities gets longer 
every day and whose reputations have been shapedby questionable 
assessment policies in the past. Making reform work amounts to 
more than just changing people's minds about participating in good 
practice. The social context of the practice of assessment is iust as 
important as the technical implications as we attempt to facilitate 
the development and implementation of authentic assessment. 



CONCLUSION 

The work presented in this book ca:'* be seen as a second generation 
of discussions on authentic assessment. Taken as a whole, the 
authors' ideas move beyond suggesting the need for new ap- 
proaches and describing isolated strategies for assessment in math- 
ematics — they suggest an integrated assessment system that has a 
shared philosophical base. The coherence of the proposals provided 
here is one of the book's greatest strengths; in generating assess- 
ment ideas from within curricular conieiv, the authors present 
models that avoid the fragmentation of instruction and assessment 
practice that has been such a problem in recent years. 

In this transitional period, we are moving from atomistic 
views of teaching and testing to interactive visions of learning and 
assessment. In the process, we are shedding old ideas of what 
constitutes "real" information about student learning and reach- 
ing across disciplinary boundaries to construct new ways of know- 
ingaboLit the educational process. The key to making this work lies 
in continuing to explore the possibilities that exist, often in places 
where we might least expect to find them. In dismantling the 
boundaries that have held traditional measurement practice in 
place, we are more likely to come across ideas and strategies that 
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will fit the needs of a reformed curriculum. New assessment theory 
and practice will need to be generated from within the frame of 
curriculum content as well as by the measurement community. 
Our best bet may be to form collaborations whose purpose is to 
develop strategies congruent with the spirit of the subject matter 
and within the limits of the environment in which that subject 
matter is taught and learned. 



REFERENCES 

Airasian, P. W. (198S1. '’Measurement driven instruction: A closer 
look." Eduanioiuil Measurement: Issues and Practice 7, no. 

4:6-11. 

Apple, M. W. 119921. "Do the Standards go far enough* Power, policy, 
and practice in mathematics education." loiirnal of Research in 
Mathematic'^ Education, 2S, no. S:412-4v^l. 

Berlak, H. tl992l. "The need for a new science ot assessment." In H. 
Rerlak, F. M. New man n, E. Adams, D. A. Archbaid, T. Burgess, I. 
Raven, 6;. T. A. Romberg, Towards a new sctcnce of educational 
te.stiny, and assessment, Albany: State University of New York 
Press. 

Brown, ). S., Collins, A., dk Duguid, P. (19891. "Situated cognition and 
the culture of learning." Educational Researcher 18, no. l:.^2-42. 

Bumes, D. W , ^ Lindner, B. I. (198S1. "Why the slates must move quickly 
to assess excellence." Educational Leadership 4.^, no. 2:18-20. 

Cherryholmes, C. H. (19881. "Construct validity and the discourses of 
research." American journal of Education 96, no. 3:42 1 -4 .S 7. 

Cohen, S. A. (19871. "Instructional alignment: Searching for a magic 
bullet." Educiitionid Researcher 16, no. 8:16-20. 

Frederiksen, I. R., Collins, A. (19891. "A systems approach to educa- 
tional testing." Educational Researcher 18, no. 9:27 -32. 

Gomez, M. L.. Graue, M. E., Bloeh, M. N. (1991 1. "Reassessing 
portfolio assessment: Rhetoric and reality." Lan^ua^c Arts 

68;6i20-628. 

lohnston, P. H. (19891. "Constructive evaluation and the improvement 
of teaching and learning." Teachers Co/Zegt* Record 90, no. 

4:.S09- S28. 

Linn, R. L., Baker, E L., Dunbar, S. B. (19911. "c.omplex, pertor- 
mance based a.ssessment: Expectations and validaticni criteria 
Educational Researcher 10, no. 8:.S 21. 

Mehrens, M. A., Kaminski, I. 119891. "Methods for improving 

standardized tests scores: Eruitful, fruitless, or fraudulent." Fduca- 
t awl Measurement: Issues ^ Practice 8, no. 1:14-22. 




oc 

^ u . ) 




COSNECIING VISIONS OF ASSESSMENT TO PRACTICE ❖ 275 



Mcssick, S. (1981). '"Evidence and ethics in the evaluation of tests." 
Educational Researcher 10:9-20. 

. (1989). "Validity." In R. L. Linn (ed.), Educational mcoc^ure- 

ment, 3rd ed. New York: Macmillan. 

Moss, P. A. (19921. "Validity in educational measurement." Review of 
Educational Research 26, no. 3 229-2.S8. 

, Beck, I. S., Ebbs, C., Matson, B., Muchmore, J., Steele, D., 

Taylor, C., Herter, R. (1992). "Portfolios, accountability, and an 
interpretive approach to validity." Educational Measurement: 
Issues and Practice 1 1, no. 3:12-21. 

National Council of Teachers of Mathematics (NCTM), (1989). Curricu- 
lum and evaluation standards for school mathematics. Rest on, 

VA: Author. 

. (19911. Professional standards for teaching mathematics. 

Reston, VA: Author. 

Shepard, L. A. (1991). "Psychometrieians' beliefs about learn in>;." 
Educational Researcher 20, no. '7:2-16. 

o 

Smith, M. L. ( 1 99 1 1. "Put to the test: The effects of external testing on 
teachers." Educational Re. searcher 20, no. .S;8-l 1 . 

Wolf, D., Bixby, }., Glenn, 6^ Gardner, H. (1991). "To use their minds 
well: Investigating new forms of student assessment." Review of 
Research in Education 1'7:31“'74. 



O Q 'i 
^ U 



O 

ERIC 



CONTRIBUTORS 



fan de Lange 
Freudenthal Institute 
Utrecht, The Netherlands 

M. Elizabeth Graue 
University of Wisconsin 
Madison 

J\itricia Ann Kenney 
University of Pittsburj;h 

Susanne P. Lajoic 
McCxill University 
Montreal 

Thomas A, Romberg 
University of Wisconsin 
Madison 

Edward A. Silver 
University of Pittsburgh 

Robert Stake 
University of Illinois 
Champaign 

Linda D, Wilson 
University nl Delaware 
Newark 

Mark Wilson 
University of California 
Berkeley 




^ C 
C O 



SUBJECT INDEX 



alignment 

and authentic assessment, .^0-1, 261 

and behaviorism, 4 

and curriculum reform, 7-S, SP2-.^ 

definition of, 198 

ditticuhies of, 196-9. 214 

as simplification curriculum, 

199 

assessment feedback. See teedback 
assessment levels 
higher level, 1 0.^-9 
lower level, 96 -T 
middle level, 9''- 102 
reason lor development ot, 94-6 
assessment nets 

Calihunia Framework as example 
ot, 246-8 

discussion cU, 246-49 
Australian IMPACT l^miect 
as example ot authentic assessment. 
41 

authentic assessment. .See til so 
iMrnculum iimi tvahuiium 
Suinihtnls lor School 
Mdthcnuincs jNCTMl; rept^rting 
procedures; task developiuent 
argument tor development . 2. 19. 
24 16S-9 

assumptions underlying 2-4 6 

iH-.io 

and continuum maps, 24.TS6 
criteria tor, 249-40. 241 
tramewoiks hir. 19-.^*^, 24"'-44. 24S, 
264- 1 

4IS insiiuctional guidance, ks-sts, 
264-6 

issues in, 1-lS, 168-169, 269 -2’’4 
4ind learning theories. 11 12. 2^6 
principles tor, 10 1 I , ^0 M 94 4 
iiualu\ coniiol IP. 2^"'. 2 1s S(^ 



heh.iviorisin 

and mathematus educaiion -1 4 



British Committee of Inquir\- into the 
Teaching ot Mathematics in 
Schools 
goals ot, 89 

Bush, (President' George 
and national educational goals, I 
199, 201 

Calitiirnia Achievement Test 
as example of standardized 
achievement tests, 41 
California Assessment Program 
California Framework, 24 T-k. 246-8 
as example ot authentic assessment. 
42 

and open-responsc tasks, 69, tq 7^^ 
and portttihos, 7,^ 

Chapter 1 programs 

and standardized achievement tests 
as criteria ftir placement in, 44-4 
classrmun tests, ' ee teacher-made 
tests 

Cognitively Guided Instruction 

as example ol authentic assessment. 
42 

cognitive psycholog>' 

cognitive apprenticeship as model o\ 
instruction 26-7 

and theories ot learning. 4-"'. i0. ''4 
College Board Advanced Placement tests 
and (q'len-response tasks, 69 
commumeation skills 

and Calitorma Assessment ITogram. 

U 

and classroiun tests. 6.s, 144 
criteria tor evaluation ot. 7(i 
and (.'ij/ru ii/imi tiihl I Vtiluiinon 
Sunhltirtls lor Si hool 
Wtifhcnuilh s ,NCTM'. 9 21. 24 
49 Ml 

development i)t, tn mathem.ities 
education. 2^4 
and equitv issues 24 




O O 

-Oh 





280 ❖ SlBjECT lM)f\ 



computers 

ettcct of, on mathematics educatum, 
87-S, 92 

use of, in mathematics education, 

20, 22-3, 28-9 
conjecturing 

as mathematics skill, 4, 20 
connec ted curn cu 1 u m 

use of, in mathematics education, 
24-5 

Connecticut Common Core of 
Learning Pro)ect 

as example of authentic assessment, 
32 

constructivism 
assumptions of, 5-0, 91 
as foundation tor authentic 
assessment, 10-12, 262, 263, 264 
as instructional guidance, 91-3 
as result ot changes in mathematics, 
and testing, 6-7 
content inventories. See tilsn 
curn c Lil um ; problem -sol ving 
skills; real-world aetivities; 
task development 

orgainzatuin ot, 1 7g-9. lSO-2. 194-5, 
223n 

and teachers, 182, 183-4, 214, 220n 
core curriculum 
and decentralization of school 
management, 204 

criterion-reterencing. Scr a/sf) district; 
state- level tests 
emphasis of, 223n, 224 n 
curriculum. .See iiJso content 

inventoiies; public accouniabilitv; 
task development 
and alignment with testing. 4 1 - C 
46. 4"". 196-9, 214 
change in. 92, 16^, 201, 21 1 M 
and curriculum coordinators, 1S4 
eltcct ol testing on, 2 1.^ 
and lnternation.il Assessment ot 
Educational Tiogiess tlAEPV 46. 

4" 

and jNational Assessment ol 

Lducainmal Piogtess iN.M.I”. 19 
antln.ition.il edutaiion icloiiii. s, 
9i) 

and teachers. 199, 220n 
( [uri lihnn ,unl I viiJihUhHi Sn//j(/o/o\ 
itn St /;/»»/ /ihil u ^ |N(.' 1 /NP 



and authentic tasks, 25, 30-1, 76 
and balanced assessment, 162-3 
and development of standards, 1-2, 
7-9 

general assumptions of, 4, *^-8, 10- 

11,21 

and goals of mathematics education, 
89-90 

and instructional guidance, 26, 58- 
60, 68, 72, 220n 

and learning theories, 5-6, 27, 264 
and National Assessment of 
Educational ProgressiNAEPV *4 
and questioning as assessment tool, 
62 

and writing as assessment tool, 63 

Department ot Labor 
and criteria tor high sehoed 
graduation, 20 
district/state- level tests 
changes in, 

as external assessment, 45-S 
as instruetional guidance, 39 

equity issue'' 

in assessment, 31, 34, 248, 2"^ 1-2 
ai.d item-response theors’, 256 
in task development, 121—1 
evaluation criteria 

in QUASAR assessment, 69-"^0 
in Veimont portt(dios, 7g 
exploration 

as mathematics skill. 4 
I'xternal .issossmi-nt See t//^a 

standardized aehievement te'-ts, 
testing 

and classioom instinct ion. 40 56 
56, 7S-fi, 265-6 
lim nations ot, 54 5, "^5 
national.' international sLirvevs, \s, 
U), 4S -9, 52 4 
reiorni ot. 76, 27() 
and scoring 1 < 
state distriet'level tests 19 

tcedhack 

adaptive .sealtoldmg'. 26 " 
and coinpuivis, 21, 28 9 
and ohserv uion, 61 




f -t 'A.. 

'29 ( 



si HIK 1 l\l){\ ❖ 281 



First StepK proR'ct iWcstcrn Australia' 
as example of assessment 
framework, 237 
Freudenthal Institute 
and construetion of tasks, 91. I.s6 

Goals 2000 

and mathematies achievement, 4S 
j;raphs 

as assessment tool, 22“3, .^3, 1.^3-6 
group activities 

and communication skills, 24, 2.^, 
.38, 39 

and curriculum, 39 

design of, 28, 3 1 , 32 160, 164-6 

and prt)blem*solving skills, 1~' 

He wet Mathematics Prnieet ;The 
Netherlands' 

as alternative assessment proieet. 33 
homework 

as assessment tool, 71-2 

liuernatioi il Assessment of 
Educational Progress iIAEP' 
and prolicieney levels, 49, 30- 1 
reason for existence, 202 
interrater agreement 
and external assesMiient, "^l, 233-6 
Iowa Test of Basie Skills 
as example of standardized 
achievement tests, 41 
Item response theory ilRT' 
as psyeliometrie model, 6, 263 
and <.)Liahtv control of assessniem. 
248-36 



UHirnals 

use ol m maihein.iues education 
^ 1 , 63 63 . 1 



learning iheoiies 

ami niatheinatu ^ edi" aiion 16 \ \ 
2 V 38. 91 2 



mathematical hui aieliies 
aiguinenis against. \ 



and phenomenographieal analysis, 
242-1 

and standardized achievement tests, 
8 

mathematies ability 
difficulty of testing, 181-2, 22 In 
math e m a t i e s ac h i e v e m e 1 1 1 

artificiality of, as construct. 179-81, 
220n 

complexity of assessment, 18 1- T 
236 

as instructional guidance, 191 
Mathematics Contest in Modeling 
example from, 1 1^ 
mathematics education. See iilsn 
public accountability 
and conceptual fields (Vergnaudl, 13 
definitions of, 174, 173 
dissociation of topics, 181-3, 22 In 
mathematical categories as teaching 
strategy’, 179 

mathematical literacy, 4, 20 
sample topics in, 1 76 
multiple'choice formats. So- t//so 
standardized achievement tests 
alternatives to, 31-3, 263 
construction of, 133-7 
criticisms of, 41, 43, 33, 68, 264 
and group activities, 164-3 
and machine scoring, 12, 243 6 
use of, m mathematies testing, 6, 

43, 34, 244-6. 247 



National Assessment of Fdueational 
Ih'ogress iNAI: 1" 

.md comparisons of groups, 202 
and ('urneulum (iiul iwihuition 
StiniihnJs for Sehno} 

Milt he Wilt les iNCM'M’. 34 
leports of, 49, 32-4 
National C'ouneil ol Measiiiement in 
Education iNC!Ml ' 
and testing mandates, 204 
National C'urneulum Proliles 
lAustialia' 

as example ol assessment 
hame\s<a k, 1 1’’. 1 16 
National Cun leiiUiiii siiaiuls T'mted 
Kingdom' 

as example ol assessment 
tiaine\soik 2 1'’ 



O O C 
<2 0 



282 ❖ SLBjtn i\ixx 



national education reform. Sec dUo 
content inventories; curriculum 
and decentralization of school 
management, 204 
goals of, I, 199, 201, 212 
influence of district/state- level tests 
on, 47 

involvement of teachers, 200-1, 204, 
223n, 22Sn, 272-3 

and public accountability, H7-8, 200, 
202, 203^, 260 

and public discussion of national 
examination system, 38. 48, 79n, 
199,204 

• reasons for, 8^-8 
National Governors Association 
and national educational goals, 1 
norm -referencing. See dlsu 

standardized achievement tests 
use of, in testing, 223n. 224n 



observation 

as assessment tool, .^9, 40, 60-1 
open -response tasks 
as assessment tool, 40, 66-8, 242 
and California Assessment Program. 
32 

and classroom tests, 68-9 
construction of, I.^7-40 
and Hewet Mathematics Protect 
iThe Netherlandsi, 
in portfolios, 74 
scoring of, 11,12 

p.iper-and-pencil tests 
alternatives to, 3 1 -8 
as assessment to' 28. ^9 .10, ^6 
pertormance assessment. .Sec <//\o 
authentic assessment 
complexity ol, 236-7 
formats of, 244-8 
phenomenography 
and methods ol assessment. 242 4. 
232 

portfolios, use of 

in C'alitorma Framework, 248 
criteria lor, 239 

as format for assessment, 72 4, 
160-2. 269 



in Vermont pilot project, 31-2, 73-4, 
78-9, 238, 252 

problem-solving skills. Sec oho task 
development 

in authentic assessment 244 
in classroom tests, 68 
and Connecticut Common Core of 
Learning Project, 32 
in group activities, 27 
and Hewet Mathematics Project 
(The Netherlands!, 33 
in homework, 71 

in mathematics curricuium, 21-3, 

25, 244 

as middle level assessment, 97-102 
and standardized achievement tests, 
41, 43 

and stategy*Lise models, 241-2 
and Vermont portfolios, 73 
process-product paradigm 
and teacher-student interaction, 6 1 -2 
ProfessjonaJ Stiindartis of Tcitchiny, 

Mo the nun 1 C s iNCTMl 
and instructional standards, 2 
progress maps of learning 
as reporting model, 14- 15 
psychometric models 
and test theory, 6-7, 246, 260-1, 264 
public accountability Sec oJso 
national education reform 
effect on curriculum, 1, 21 1-13 
and reporting procedures, 14-16 
societal goals of mathematics 
education. 89-90 



(2UASAR project 
and assessment tasks, 69 -7()^ sOn 
QUASAR 

and Ford Foundation grant. '^9n 
iiuesiiomng 

as assessment tool, 22, 40, 6! 



Raseli models 
use nt, in i|ualny control ot 
assessmetil. 248 56 
real world acltvities. See tilso task 
developtnent 
and Cijutty issues, 121 4 
and ethics. 118 20 




289 ] 



suiiin ❖ 283 



and problem-solving skills, 21, 28, 

126 -.y 

role of context in, lOs^-27, 168. 264 
use of, in mathematics eiiiriculum, 
19-23, 25, 168, 244 
reasoning skills 

development of, in mathematics 
education, 9, 21, 24, 25 
and fragmented information tests, 
156-9 

and homework, 71 

and standardized achievement tests, 4v^ 
reliability 

of external assessment, 12-14, 246 
and Item-response theory, 256 
of test Items, 3 
reporting procedures 
and assessment, 1 
models tor, 14-16 



scaffolding. See feedback 
SCANS [Secretary's Caimmission on 
Achieving Necessary Skills! test 
and criteria for graduation tnun high 
school, 20 
scoring procedures 

approaches to, 11, 12, 96, 167, 16K 
in Australia, 13-14 
and authentic assessment, 1, 16, 30 
and Connecticut Common C'ore ot 
Learning Proiect, 32 
in The Netherlands, 12 
Second International Mathematics 
Study [SIMS) 
and reports of results, 49 
Secretary '.N Commission on Achieving 
Necessary Skills. .SVe SCANS test 
self -assessment 

and portfolios, 72-3 
and systemic approaehc'' 
and writing. 65 -6 
situated cognition 

and application to mathematics 
relorm, 26-7, 

and authentic activities, 20. 25 
siandaidiced achievement tests srr 
al\(> scoring procedures; student 
p lacem en t ■ ran king; testing 
eontent invcntorv ot. s, 19 ^ 6. 199 
211 12 



ease of, for assessment, 96, 244-6 
as evaluation of teaching. 202 
influence of market on, 42-3 
as instructional guidance, 40-5, 200 
invalidity of, for measuring 
achievement, 173-235, 263 
as management tool, 174-5, 203-4 
as measurement moelel, 3-4, 11-12. 

173-5, 212 
preparation for, 167 
public view of. 191-2, 214, 237 
valid interpretations of, 41, 17.^, 
191-193, 196 

Standardized Assessment Tasks 
[United Kingdom) 

as example of assessment (lofmat, 246 
standards. See CAimcuhiin lirfJ 

r.ViihuUinn Suindords for Schoid 
Miiiheniiincs (NCTM); reporting 
procedures; task development 
Suindiirds [NCTMl. .See C.urneulum 
tind I'.vtduaiHui Suwdurds tor 
School iViiiiheiihit U S [NCTM) 
Stanlord Achievement Tests 
as example ot standardized 
achie\ement tests. 41 
state level tests. .See district -state- 
level tests 
student interviews 

as assessment tool, 62 242 
design ot, 63 
student learning profiles 
conceptual models ot. 14-15 
growth over tune, 11-12 
learning progress maps. M-15 
and use portfolios, 72 
student p 1 ac em e n t r a 1 1 k i n g 
and district /state level tests. 46- 7 
and quality com rid, 248 
student ranking/placement 

and standardized achievement tests. 
44-5. 182, 193. 196, 213 14 
svstemic assessment 
and Connecticut C4)mmon Ct»re ot 
Learning Protect. 32 
dctmmonol 29 



t.isk development. See tiho problem- 
solving skills; leal-world activities 
and C'alilorma l^ramewoik. 247 g 




>00 



284 ❖ M B|[( I i\nt\ 



task development frnnniwctlt 
complexity ot, 121-*^, 126-.VV 16s. 
P8-9, 265 

criteria tor, ^ -1 2, 244 
definition of authentic tasks, 19-20 
discussion of >^‘7-172 
and equity issues, i i i -4 
formats for, 66-6S. 155-65 
and Hewet Mathematics Proiect 
;The Netherlands', 55 
model huikhng and pattern analysis. 
21,22 

supentems, discussion of, 55 
and test items, 2-5, S-9, 52-5 95 o 
teacher-made tests iclassroom tests' 
as assessment tools. 56-9 6S-9. , 

205 

teacher merit 

diUiculty ot measurement. 22 In 
teachers, role ot, .See £?/so teaching; 
as assessors, 12-14, 55 -9, 215-14 
246, 248 

haek>;round in assessment. 15- 14. 

5"^. 16S, 260, 26S 

and curriculum, 59, 1 isV 1S6, 
19S 

and definitions ot assessment. 19.s, 
225n 

as taeilitators, 62 

influence of external assessment on, 

.H-4. 4r,-7 200. 212- 1 ^ 

interaction ot, 174. 1S4 
pereeptitms of on effects of testini; 
on schools and instruction. 204- 
12, 21.5-19. 224- 5n 
test re''Ults as evaluations of. 56. 

191, 202. 214 
and use of uuirnals. 6^-4 
and Use of questions, 6 1 2 
teaehinj:. Set' teaeheis. role of 
eoinpIcMt V of, n5 9. 1S^91.221 
22n, 22 <n 

confliei between official and 
actual practice. 196 
sample sehcinaties of. 1S6 91 
standardisation of. 1 s5 
teilinojiMSv See t//ko computet s 
use of. in instiuetion. 21 22. 25. 
testing- ,S(V external assessment: 
standaidized achievement tests; 
task development, tests joimats foi 



.ili>;nment of. 4, 7.^-6. 196-9. 202 
and comparison of itroups, 202-5. 

214. 224n 

content inventors* of, 4, 8 -9 166 5. 
p.^_4. 198-9 

desiitn considerations of. 165 -<s. 

19.V6 

as education management, 205 -4. 2 15 
as ev.il nation of teaching. P5. 214 
example teacher surs’cy on ctfects 
of, 21.5-19 

and mtluencL on teachers. 175-4 
205 

and instructional guidance. 45-4, 

16-^. 192. 199-201 

as learning experience. 26-"^ 29^50. 
266-S 

and learning theories. 4-'^ 
and National Assessment of Educa- 
tional Progress iNAEP’. 52-5 
and placement, ranking of students. 
12 . 202 

purposes of. 94-6. 191 -2 
teacher perceptions of . 204- 12, 224n, 
22.5n 

tests, formats for 

eomhination of formats. 147-9 
protect work. 159-60 
student production tests, 149-56 
tmud tests. 164 
textbooks 

problems with. 195. 196-s 
sample topics in. 1 "^6 

validity 

of assessment. 246, 248, 2^0 
complexities ot. 1^8. 205 
and item resj'ionse theorv. 2-56 
teacher perceptions of. 204 -6 
nf test Hems, 2 .\ 204 s 
\ erinont pilot pmieet. S('c poiifolins 
Use of 

Victoria CairncLiIum anti .Assessment 
board 

.is model loi extern, il ex.imm.ii mn. 

66. 

,ind proieets as assessment. 2ks 
210-41 

\'ietorM C ommon Asst ssnient 
r.isks ,c'A 1 s\ 2 16 251. 252 





SI \m I i\i)i\ ❖ 285 



writing 

as assessment tool. 2S, .U. 6.^-6 
141-4 

and Calitornia Assessment Program. 
M 

and Hewet Mathematies Proieet 
, The Netherlands^ 



in open -response tasks, 6 
m p('rttolios, '^d-4 
as selt-assessment, 65-6 
VVYTlTYC 'VVYCdWICT iwhat you 
test IS wliat you get 'what you get 
is what I can teaehl 
as assessment strategy. ■^6-'^ 



AUTHOR INDEX 



Adams, E., 196, 226 

Adams. R.A., 2.^6, 2SS 

Adams, R.)., 24S. 24v, 250, 256, 259 

Adorns. \UM.. 6S, 85 

Airasian, .\W., 46, 56, 80. 82. 1*^5, 

226, 260, 274 
Alftina, !., 202, 229 
Allen, R.A.IV, 159. 169 
American Educational Research 
Association ;AERA1, 175, 226, 

257 . 257 

American Psyehcdo^ical Associatum 
lAPAl, 175^ 226, 257, 257 
Anacker, S.. 6, P 
,Andersson, B., 245. 258 
Aoki, T., 196, 226 
Apple, M.W., 275. 2*^4 
Archhald, D , 9-10. 16, 195. 196. 226 
Aschhacher, IVR., 244, 257 
Ashcraft. M.H., 24 1. 257 
Ashton-Warner S., 220, 226 
Australia Education Council. 257 246. 
257 



Baker, F.L., 29, .^6. 225, 226, 256, 244, 
257. 265, 274 
Ballew, H.. 25. U 
Baron, I. B., 29, 52, 54 
Barr('»n, B., 25. U 
Beck, ).S., 269, 2*^5 
Belli, C . 175, 198. 228 
Bennett. 62. 80 
Bereitei, e'., 221. 226 
Berlak, H., 196, 226 26"^. 269. 2’^4 
Bertm. I.. 22, 54 

Biggs, h, 220. 226 

Bixbv, I.. 29. ^7 27] 2^5 

Bloch, M.N., 275. 274 
Bi'din, A.. 2. 16,95. 169 
Boertien, H . 125, 1 69 
Borko, H.. 55, 80 
Bowman. N. 2(U.226 



Braiisford, I., 25, 54 
Bresler, L., 212, 254 
Bridgeford, N.).. 56, 58, 86 
British Columbia Ministry ol 

Education, 60, 65. 65. 80 
Britt. M,, 68, 86 
Broudy, H.. 205, 226 
Brown, C.A., 52, 80, 8,^ 

Brown. I.S., 26, V5. 2^1. 2*^4 

Brown. R.C., 80, 8 1 

Bry ant, P., 24, 54 

Burgess, T., 196. 226 

Burkhrrdt, H.. 162-5. 169 

Burneu D W., 260, 274 

Burrill.C., 102, 129, 1.55--6. 162, 165. 

169 



t'alfee, R., 212, 229 
Calitornia .Assessment Pohev 
Committee, 247. 2.- ^ 

Cahtonua Assessment Program, 52. 54 
California Mathematics Council, 45. 

81 

Califoniia State Department ot 
Educatum, 69. 81 . 257-8, 257 
Camphell, D.. 226 
Caiey, D.A., 52, 55 
Carev, N., 192, 254 
Carl, I., 220. 226 
C.n Isen, W.S., 61, 81 
CTirpenter. T.P., 52. 54, 45. ^2. 80. 85, 
176. 255 

C^lzden, C B.. 62, 8 1 
C'h ambers, B., 5 7^ .S8, 82 
Chambers. D.L., 52, 66, 81 
Charles, R., 60. 65, 65, 81 
C hernak, R.. 1 28. 169 
Cdieirvholmes. C'.H., 262, 2'^4 
C'larke. D.. 5 1, 55, 66. 81 
Cobb, P., 2.7, 57. 65, 84 
Coe k rot t, W 11.. 89, 95, 169 
C'ohen, VC,. 165, 169 



203 



58 () 



3ESTC0PVAVAILABU 




AL.THOR ❖ 287 



Cohen, S.A., 260, 261, 274 
Cole, N., 204, 226 
Coley, R., 226 

Collinj^ A., 26, 2S, 29-.K), .VS, 72, 7.V 
SI. 263. 271, 274 

CoUis, K., 4, 17, 33, 3.S, P?, m, no, 
226, 227. 233 
Connell, M.L., 63, S4 
Connell, R., 183, 220, 227 
Cook, L.L., 2.S6, 2.S7 
Cooney, T. I„ 49, S3, 88, PQ. 202, 212, 
231 

Corbett, n.,213, 233 
Costello, 1)., 200, 227 
Coulson, O.IV, 202, 229 
Cronbach, L„ 173, 192, 224, 227 
Crosswhitc, F )., 49, 83, 88, 170, 202, 
212, 231 



Oamrin, D., 224, 229 
Oanielsson, IV, 232 
Darlin^-Hammoml, L.. 43. 81, 204, 
208, 213. 223. 227 
Oavey, IV. ^2. .VI 
Davies, R„ 221-2, 227 
l')avis, R„ 222, 227 

de Lan>;e. )., 11, 12. 16. 33, 3.S, 87-1 71, 
212, 227, 263, 264, 26.S, 266, 267, 
269 

Denny, T., 212, 234 
Department ol Education and Science, 
237, 246. 2.S7 

Department of Employment, 
Education and Training, 227 
Dillon, I.T., 62, 81 

Dorr>bremme. D.W., .36, 81, 200, 229 
Dossey, ).A., 49, .32, 63, 66, 81. 8.V 88. 

170, 202. 212, 231 
Doyle, W., 38, 81 
Drive?, R , 222, 227 
Duguid, P , 271, 274 
Duke. D., 221, 233 
Dunbar, S.IV, 29, 36, 236. 244. 2.37, 
26.V 274 



Easley, F., D9, 227 

Faslcy. I . 179, 193, 212, 22.2 227. i 

Ebbs, C' . 269, 273 

F del man, M.. 228 



Eignor, D.R., 2.36, 2.37 
Eiseley, L., 22S 
Eisner, E., 203, 228 
Elliott, I., 221, 234 
Ellwein, M., 211, 228 
Ellwein, M.C., .37, 82 
Elton, L.R.B., 38, 81 
Eraut, M., 196, 228 
Erick.son, G., 222, 229 
Erlwanger, S.H., 63, 82 
Ernest. P., 3. 4. 3, 1 6 



Fenncma, E., 32, 34, 3.3 
Fennessy, D., 36, 82 
Fenstermacher, G., 222, 228 
Fiske, D., 226 
Fleming, M., .37, .38, 82 
Floden, R., 173, 193, 198, 211, 228 
Fong, G.T., 20, 3.3 
Foreman, L., 62, 80 
Forgione, P., 32, 34 
Freehtling, I. A., 80, 82 
Frederic ksen, N., 26, 33 
Frederiksen. I.R.. 28, 29-30. .^3 263, 
274 

Freeman, D., 173. 193. 198. 211. 228 
Freudenthal, H., 91, 169 
Furst, N., 61 , 8.3 



Cbige, N., 18.3, 228 

Gagne, R., 180, 228 

Galbraith. 92 

Galbraith, P.L., 91, 1 7() 

cVirden, R.A., 136, 171 

Gardener, H., 29, 37, 236, 239, 271, 

273 

CVirgiulo, S., 68, 86 
Cdfford, ).A., 2.36, 2.37 
CdtOLix, H., 1 8.3, 228 
c; laser, R., 7 . u,, 75^ 79, 82, 202, 223, 
228 

Glass, G., 21 1, 228 

Glenn, I., Ill, 29, 37, 236, 239, 271, 273 

t;oad. L., 196. 228 

Godwin, R., 222, 228 

Caiertz, M., 226 

Goffree, F., 109, \ 7 }_ 

Gcnn, L., 23, 34 
eioldstem, FE, 202, 228 




n 



04 



288 ❖ AumoK 



Gomez, M.L., 27^, 274 
Cong, R., 14, 17 
Coodlad, ]., 222, 22S 
Goslin, D., 200, 229 
Grauc, E., 224, 1^0. 260-75 
Gravemeijer, K., 91. 97, UK), lOd, 106, 
111, 112, 170, 172 
Green, I.L., 62, 82 
Greeno, I., 25,35. 229 
Griminett, P., 222, 229 
Gronlimd, N.E., 154, 141, PQ 
Grossman, R , 150, 170 
Grumet, M., 222, 229 
Gulliekson, A.R., 56, 5"^, 82 



Hacrtcl, E.. 56, 57 , S2. rs, 19.^, 212, 
220, 229 

Halberstam, D., 225, 229 
Hall, B., 205, 229 

Hamhleton, R., 182, 202, 229. 256, 257 
Haney, W., 215, 229 
Hannon, M.C., 76, 85 
Harvey, I. C„ 55,55 
Hasselbring, T., 25, 51 
Hasnngs, T„ 224. 229 
H5VO, Mathemattes A.. 107. i i.y 
126, 140, 170 

Hawkins. I., 28, 29, 50, 55 
Hendcrstin, K., P8. 229 
Herman, 1„ 56, 81, 2t)0. 229 
Herter, R., 269, 27.5 
Hill, S.A., 55, 82 
Hively, VV. 11, )77^ 229 
Hoke, G., 212 2 VI 
Hotvedt, M, 200, 2U1 
House, E.R. 192. 2.K) 

Howson, G.. 88 
Hughes. P., 68, 86 
Hiisen. 1 .. 48. 8J 



lllincMs Stale Roaul ot I due.Uion, 9^ 



laeger. R.. P5, 201 2K) 
lencks, S M . 6.^, 84 
lohnson, n , P9, 250 
lohnson, R , 179, 250 
lohnston. P.1 1 , 269, 274 
|olin''ion. W.R., 88, 1 "^O 



lonassen, D.H , 222, 250 



Kaminski, |., 260, 274 
Kelderman, H., 248, 257 
Kellaghan, T., 56, 82, 200, 224, 251, 
255 

Kemmis, S.. 205. 250 
Kenney, P,A,, s^8-86, 264, 265, 266, 
269 

Khaketla, M., 45, 84 

Kiter, E., 49, 85, 88. 170, 202, 212. 251 

Kilpatriek, )., 42. 56. 57, 65, 68. 85 

Kindt, M.. 170 

King, D.I.. 224, 251 

Klein, S., 78, 85 

Komoski, K., 198. 250 

Korerz, O.M., 7«, 85, 200, 225, 250 

Koiiha, V.L.. 52. 8^\ 85 

Krantz, D. H., 20, 55 

Krugienski, H., 52, 54 

Kuhs, T., 21 1, 228 

Kulewiez. S., 25. 54 

Kustiner, L.E., 128, 169 



l.aioie. S„ ■'. 10, 16. 19 . 57 , 265. 

264 

I amon, S., 14, 1 7 

Lampert. M., 19. 20. 27, 55, 185, 250 

Lane, S., 69 -7Q, 86 

Lapointe. A.E., 49. 50-1. 85, 88, I'^O 

Lee, S.Y., 88, 172 

Leinhardt. G . 45, 77 , g.:; 

LeMahieii, P.. 45, 77 ^ g.y 225, 2^0 
Lesgold, A., ■". 16, 26, 29, V5 
Lesi-. R., 14. 17 
Le -ii. i , r., 60, 65. 65, 8 1 
Liebernian, A.. 185, 225, 250 
1 tn.iere, I.M., 248, 257 
1 mdner. ILL. 260, 274 
Lindijuist, E.F., 244, 257 
Lindquist, M.M., 52, 66, 80, 81, 85 
Linn, R.L., 29. 56, 7^;, sO, 85, P.L 224. 

250, 256, 244. 257, 265, 274 
I ittlclield. I , 25, LI 
Lohnian, H L., 7S, 86. 2.L1 
Lomax, R.G , 76, 85 
lord. F. M , 256, 257 
Lorge, L. 192 2.H 
1 oitie. n., 185 . 251 



■u iMOK t\i)i\ ❖ 289 



LumUrcn, U.. 20.V 222-. 2.M 
Lundin, S., 1 . 229 

Lybcck. 242. 25'' 



Maa^cn. ].. rO 
Mabry. L. 212. 224 
MacRur\\ K., 202. 221 
Madaus. C; F.. .^6. ^6, SO. S2. S.^. P2 
204. 212. 224. 22^v 229. 221 
Mamona. l.PvS S6 
Mandinach. E., 224, 224 
.Marton, F. 242-2, 2=i'". 2.=»S 
.Marvin. C.. 22 1 

.Masters, C.N., l^(^. 2.^1. 2.=i2. lr^(^. 2=*.s 
2.^9 

.Mathcniatital Sciences Education 
Board iMSEBV S''- 9.S, US U>1. 
KP, PO 

Matson. B„ 269. 2".S 
Mattsson. H.. 20.S. 221 
.MAVO. Mathematics A. l.^S PO 
Maxwell. C.. !"■". 229 
.McCattrev. O.. "'S, s2 
.Mcnumiell. L., 192 2.^4 
.McKnii;ht. C.C.. 49, S2. SS 1 "0 202 
212. IM 

.McLean. L.. 192 200, 221 
McTaj;j;art. R.. 20.S, IM) 

.Mead. N.A.. 49. ^0- 51, S2. SS. PO 
Meehl. l\. 192. 22^ 

.Mchrens. M.A., 260. 

.Mehrens, W.. 19S, 221 
Messick S.. 262. 2‘'1 
Midj;le\, ,M.. 224. 221 
.Millman. I.. 222. 2.M 
MinistiA ot Education. Western 
.‘\iistralia. 2.P 2.SS 
.\bslevv. K.. 6, P. 1M\ 

.Mitzel. H E.. 61. S.\ 

-Monev. R.. 1 V P 66 o"'. n6. 246 

2^S 

.Moigan, K.. 22^ 2L 

Moss. P.A., 26.L 26S-9 2^0. 2^1 2^"^ 

Mostcller F., 20. 26 

.Moiiselev 1 2 P 

.MiichmoK- 1 269 2~'\ 

Miillis, l.\ S. S2. vl 6 , (>(> Si s ^ 
Miinime 1 . S4 161 162, PI 
Murphv. S.. 160 1, PI 
Mvrdal ^ . 2 U 



\agv, P. 202.. 221 
National .‘\ssessment Govern ityi; 
Board, S4, S4 

National Council ot Teachers ot 

Mathematics iNCTM'. 1-2. 4. .S • 
6. S-9, 10-1 1. 16. P, 21. 22. 26. 

2''. 20-21. 22. .^6. 42 44. 22. .24. 
2S, .29. 62. 62. 6S. ''2. ‘'6, S4. S9- 
90. 121. 121. 162. PI. IS9. 220 
222. 222, 261, 264. 2*^2. 2*^2 
National Council on Measurement in 
Education iNCME). P2, 226. 22*^. 
227 

National Institute ot Edueatum. ‘^2. S4 
National Mathematics A-lympiadv 
142. 1.2S. PI 

National Research Counci!. 42. ^S, S4 

Natriello. C.. 2S. S4 

Neuman, O.. 242, 22S 

Newman. S., 26. 22 

Newmann. F.. 9. 10. 16. 192, 196. 226 

Nicholls, I.G. 62, S4 

Nishett. R.E. 20, P 

Niss. M.. PI 

Nitko. .A,.l. 29, 26. ■'2. sO. S4 
Noe-Nvi;aard. E.. \ PI 



Oakes. 1., 192. 222 224 
CXixaea, I.. ss. 1 1 
O natter. P.. 60. 62. 62. si 
CMdham. E.E., \M^. Pi 
Oliver, n.. 222 
C'iwen.'E.H. ^2, 6=.. 66. S.^ 



Packer. A.F . SS. PO 
Pandev. T. 69, ''2. S4 
Pavne, P A . SO S4 
Pea. R.P.. PI. 222. 222 
Pearsol, |.. 222. 224 
Peck, P.M . 6.V S4 
Pere>ra-Mendoza L.. 21 
Pern-, M.. SS. P2 
Peters, T., 224, 222 
Peterson, P . ^. P. 22. P 

Phillips, PC.. 9^ PI 

Phillips. GW 49 ,S(1 1. 6'' 66 s2 

SS. PO 

Phillips L. 12 s UP 
Phillipv S. 20=. 229 







290 ❖ AL-THOR \\\)[\ 



Piaget, I., 193, 232 
Pipho, C., 202, 232 
Plake, B., 236-7, 238 
Poliak, H., 19,36 
Popham, I., 220, 223, 232 
Popkewitz, T., 211, 232 
Popper,K., 93, 171 
Porter, A., 173, 193, 198. 211, 223, 
228, 232 
Post, T., 14, 17 
Proppe, O., 232 
Proud, S., 246, 23S 



Querelle. N.. 133, 171 



Rabehl, G„ 173, 229 

Rakow, E.A., 224, 231 

Ramsden, P., 242, 243, 23S 

Rasch, G., 230. 238 

Raths, 1,212. 234 

Raven. I., P3, 196, 226, 232 

Reed. S. K.. 22, 36 

Renstrom, L.. 243. 238 

Resniek. D.P.. 80. 84, 232 

Resniek, L., 3, r, 19, 20, 2^. .^6. SO. 

84. 88. 94. 162-3, 169, 171, 177, 
183, 232 

Reynolds W.A., 88. Pi 
Rhemboldt. W.C., 88, PI 
Riee, M., 232 
Rindone, D., 32. .U 
RobitaiUe, D.. 233 

Romberg, T., 1-18.22.30, >3. 33. .Vi, 
3*^, 43. 84. 83, 91. 102, 129, 13.V6, 
169, PI, P3. P6, P9, 196, 198, 
212, 223, 226, 233, 236, 2.39, 263, 
264, 268 
Roodbardt, 1 1 3 
Rose, B., 63-64, 83 
Rosenbeek, M., 23, 36 
Rosensbine, R., 61, 83, 183, 2.^3 
Ross. K., 203, 230 
Rotberg, I.C., 48, 83 
Ruesink, N., 106, )70 
Runkel. P.. 224, 229 
Russell. H 11 . 136. PI 



Sacks. I-, 200, 233 

Salmon C'ox, L., 34. 36, 77 83 



Salomon, C., 29, 36 

Sanders, N., 223, 230 

Sarason, S., 222, 233 

Sato, T., 222, 2.33 

Saylor, G., 211, 233 

Scbeftler, 1., 176, P9, 233 

Schlesinger, B., 63, 68, 83 

Sebmidt, W., P3, 193, 198, 21 1. 228 

Seboenteld, A., 27, 36 

Schon, D., 222. 234 

School Mathematics Study Group. 

222, 234 

Schwartz, .k3. 3.3 
Sebwille, P3, 198. 211, 228 
Science Research Associates. 41-2, 83 
Sension, D., 177,229 
Shaughnessy, I. M., 29. 36 
Shavelson, R„ .3.3, 6.3, 80. 83, 192, 234 
Sheehan, N., 223, 234 
Shell Centre tor Mathematical 
Education, 68, 83 

Shepard, L., 4, 6, 1 7, 43^, 83. 204, 

208, 13, 234. 237, 238, 26*^, 272, 
273 

Shohamy, E,. 1 73. 234 

Sieglei, R.S., 241, 238 

Silver, E.A., 38-86. 264, 263. 266. 269 

Simons, H., 221, 234 

Singer, I.D.. 22,37 

Smith, G., 196, 228 

Smith. M.A., 160-1. PI 

Smith, M.L., 43, 86, 204, 208, 228. 

234, 260, 273 
Smyth, ].. 203, 230 
Snow, R.E., 28, 37, 7S, S6, 224. 234 
Soloway, H., 222, 232 
Souviney, R., 68, 86 
Stake, R., 223. 234 
Stake, R.. 1 73-233 
Steeher, R . 78. 83 
Steele, D., 269. 273 
Steinberg, L., 248, 238 
Stenmark. I.K., 44,60, 62, 64, 63, ‘^l, 
73, 86 

Stenzel. N., 212. 234 
Stephens, M., 13, P, 31, 33, 66-6*^, 77, 
86, M6, 238 
Stern, P., 63, S3 
Stevenson, H.W , 88, P2 
Stiggms, R.)., .36, .37, 3S, 86, 221, 23.3, 
236-7, 244, 238 
Stigler, I.W.. 88, P2 



29 



( 



o 



AUTHOR l\OC\ ❖ 291 



Stocckingcr, I., P6, 235 
Strcctland, L., 91, 97, 100, 104, 106, 

150, 170, 172 

Swafford, I.O., 49. 52. SO. 83, 88, 1 70. 
202, 212, 231 

Swaminathan, H., 256, 257 
Switt, 21, 36 

Tanner, M.A., 22,37 
Taylor, C., 269, 275 
tc Woerd, E., 106, 170 
Theobald, P.. 207, 21 1, 224, 235 
Thissen, I)., 248, 258 
Thurston, W., 3, 17 
Tittle, C., 173, 200, 230, 235 
Torrance, H.. 225, 235 
Traub, R., 202, 231 

Travers. K.)„ 49, 83, 88, 134, 136, 170, 
172, 202, 212. 231 
Traxler, A., 173, 235 
Treffers, A., 91, 109, 149, 172 
Tres-Brcvig, M. de P., 224, 235 
Trowell, ).. 68, 86 
Tyson -Bernstein. H.. 42. 86 

United Nations Educational, 

Scientitic, and Cultural 
Organization (UNESCO), 235 

van den Brink, I., 104, 172 
van den Hcuvel-Panhuizen, M., 91, 97, 
100, 103, 106, 111, 112, 170, 172 
vanderKooii, H„ 107, 144. P2 
van der Ploeg, D , 106, 170 
van Donselaar, C., 106, 172 
van Manen, M., 222, 235 
Vannatta, G., 176, 235 
van Reeuwiik. M., 102, 129, 153-6. 

169 

Verhage, H.B., ISl, 169 
Verhoef, N.C.. PO 
Vermeulen, W., 106, PO 



Vermont Department of Education, 
73, 78, 86, 238, 258 
Viator, K.T.. 76, 83 
Victoria Curriculum and Assessment 
Board, 99, 240, 259 
Villemc, M„ 205. 229 
VWO, Mathematics A., 1 16, P2 
Vygotsky, L. S., 26. 27, 37 



Wainer.H., 22,37, 

Waterman, R. )r., 224, 232 

Watson, I.M., 177, 227 

Way wood. A., 31, 35 

Webb, N., 236, 259 

Weinzwcig, A.I., 136. PI 

West, M.M.. 76. 83 

Westbur>-. 1., 134. 136, P2 

Wheatley, G., 65. 84 

Whetzel,^ D., 20, 37 

Whitney, D.R., 80, 86 

Wiley, D.. 178, 220. 229 

Willett, I.B., 22, 37 

Williams, S.R.. 43, 85, 225 233 

Willis, S., 235 

Wilson, B.. 213, 235 

Wilson, I., 187, 235 

Wilson, L., 1-18, 43, 84. 263. 264. 268 

Wilson, M., 6, 13, 18, 236-59, 264. 

265, 266, 269, 270 
Wise, A. E., 43. 81, 213, 227 
Wolt, D., 29, 37, 236, 259, 271, 2T5 
W'ood, T., 27, 37 , 65, 84 
Wright, B.D., 256, 259 
W'12-16,tcam. 100, 123. P2 



Yackel, E., 27, 37, 65, 84 
Yamamoto. K., 6, 1 7 

Yerushalmy. M.. 33, 35 



Zarmnta. E.A.. 4, 15, P, 22, 3T 43, 85, 
212. 225, lU 
Zawadowski, W., 109 



2 n 8 



o 



Reform in School Mathematics 
and Authentic Assessment 

Thomas A. Romberg, editor 



“This Ux-ik addresses a key issue in mathematics education ttxlay. The perspective from 
which this text is written is one that is needed as a response to the occurring reform in 
mathematK education and the movement to understand learning and doing mathemat- 
ics as scKial construction. How to align this newer conceptualization with testing 
procedures is an issue that is often raised in the circles in which l am active. 1 especially 
like the emphasis on the use of assessment as an integral part of the learning and 
teaching process that is evident in several of the chapters. 

- Constance Smith, State University of New York at Geneseo 

Tixlay new ways of thinking about learning call for new ways for monitoring learning. 
Reform in School Mathematics builds from the vision that as.sessment can become the 
bridge for instructional activity, accountability, and teacher development. It p aces 
teachers in key roles while developing the theme that we cannot reform the way in 
which schcxd mathematics is taught without radically reforming the ways the effects of 
that teaching are monitored. Among others, this volume addresses the issues of the 
specification of performance standards, the development of authentic tasks, the measure 
of status and growth or a combination, the development of psychometric models, and 
the development of scoring rubrics. The new models proposed in this book give teachers 
a wealth of nontraditional assessment strategies and concrete ways to obtain measures o 
both group and individual differences in growth. 

“The bixik will appeal to mathematics education researchers, curriculum developers, 
superv'isors, and personnel at state and national testing agencies. Those interested in 
general issues of assessment will also be interested in this Uxik, particular y because 
mathematics is leading the reform of curriculum and assessment in schools in this 

country tixiay.” _ j Sowder, San Diego State University 



Thomas A. Romberg is Director of the National Center for Research in Mathematical 
Sciences Education and he is a Sears Roebuck Foundation-Bascom Professor in 
Education at the University of Wisconsin-Madison. He is editor of Mathematics 
Assessment and Evaluation: Imperatives for Mathematics Educators, also published by 
SUNY Press. 
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